MathGroup Archive: September 2004 [00053]

[Date Index] [Thread Index] [Author Index]

Re: newbie is looking for a customDistribution function

To: mathgroup at smc.vnet.net
Subject: [mg50435] Re: newbie is looking for a customDistribution function
From: koopman at sfu.ca (Ray Koopman)
Date: Fri, 3 Sep 2004 03:36:14 -0400 (EDT)
References: <ch3o86$t96$1@smc.vnet.net> <ch6nlk$2d5$1@smc.vnet.net>
Sender: owner-wri-mathgroup at wolfram.com

koopman at sfu.ca (Ray Koopman) wrote in message 
news:<ch6nlk$2d5$1 at smc.vnet.net>...
> János <janos.lobb at yale.edu> wrote in message news:<ch3o86$t96$1 at smc.vnet.net>...
> [...]
>> Now, I would like to create a distribution function called  
>> twocombsLstDistribution which I could call and it would give me back  
>> elements of twocombs with the probability as they occur in distro, that  
>> is for on average I would get twice as much {c,a}s as {d,a}s and never  
>> get {d.c} or {d,d}.
>> 
>> How can I craft that ?
>> 
>> /Of course I need it for an arbitrary but finite length string lst over  
>> a fixed length alphabet {a,b,c,d,....} for k-length elements of kcombs,  
>> and it has to be super fast  :).  My real lst is between 30,000 and  
>> 70,000 element long over a four element alphabet and I am looking for k  
>> between 5 and a few hundred. /
> 
> For a 4-element alphabet, kcombs will have 4^k terms. 
> If k = "a few hundred", kcombs will be too big.
> Why not just sort and count the k-sequences in the data?
> 
> In[1]:= data = Table[Random[Integer,{1,4}],{100}]
> 
> Out[1]= {2,4,3,3,3,4,3,2,3,3,1,3,2,2,4,1,4,4,4,1,2,3,3,4,1,
>          2,1,4,1,1,2,2,4,3,3,1,2,4,2,3,4,2,2,2,3,4,3,4,3,2,
>          2,3,3,3,1,3,3,1,3,1,1,1,1,4,2,2,3,4,2,4,3,4,3,1,4,
>          4,3,4,4,1,3,2,1,2,4,2,4,1,1,2,3,2,4,3,1,4,3,4,4,1}
> 
> In[2]:= With[{k = 3}, Reverse /@ Reverse@Sort@Map[{Length[#],#[[1]]}&, 
>                       Split@Sort[FromDigits/@Partition[data,k,1]]]]
> 
> Out[2]= {{434, 4}, {343, 4}, {331, 4}, {243, 4}, {441, 3}, {313, 3},
>          {234, 3}, {233, 3}, {223, 3}, {433, 2}, {432, 2}, {431, 2},
>          {424, 2}, {422, 2}, {412, 2}, {411, 2}, {344, 2}, {342, 2},
>          {334, 2}, {333, 2}, {322, 2}, {314, 2}, {242, 2}, {241, 2},
>          {224, 2}, {144, 2}, {132, 2}, {124, 2}, {123, 2}, {112, 2},
>          {111, 2}, {444, 1}, {443, 1}, {423, 1}, {414, 1}, {413, 1},
>          {341, 1}, {324, 1}, {323, 1}, {321, 1}, {312, 1}, {311, 1},
>          {232, 1}, {222, 1}, {214, 1}, {212, 1}, {143, 1}, {142, 1},
>          {141, 1}, {133, 1}, {131, 1}, {122, 1}, {121, 1}, {114, 1}}

Having read the other replies, I see that I missed your question,
which is how to generate a random observation from the distribution
of k-tuples in the observed data. By far the easiest way is to take
a random k-tuple from the original data:

          Take[data,{1,k}+Random[Integer,Length@data-k]]

However, if you really want to use the distribution instead of the data
that gave rise to it then you should look into the "Alias Method" of
generating random observations from an arbitary discrete distribution.

Follow-Ups:
- Re: Re: newbie is looking for a customDistribution function
  - From: DrBob <drbob@bigfoot.com>

Prev by Date: Re: Sorting (again!), but with multiple columns

Next by Date: Re: Publicon problems converting sample document to LaTeX

Previous by thread: Re: Re: newbie is looking for a customDistribution function

Next by thread: Re: Re: newbie is looking for a customDistribution function