Re: newbie is looking for a customDistribution function

• To: mathgroup at smc.vnet.net
• Subject: [mg50435] Re: newbie is looking for a customDistribution function
• From: koopman at sfu.ca (Ray Koopman)
• Date: Fri, 3 Sep 2004 03:36:14 -0400 (EDT)
• References: <ch3o86\$t96\$1@smc.vnet.net> <ch6nlk\$2d5\$1@smc.vnet.net>
• Sender: owner-wri-mathgroup at wolfram.com

```koopman at sfu.ca (Ray Koopman) wrote in message
news:<ch6nlk\$2d5\$1 at smc.vnet.net>...
> János <janos.lobb at yale.edu> wrote in message news:<ch3o86\$t96\$1 at smc.vnet.net>...
> [...]
>> Now, I would like to create a distribution function called
>> twocombsLstDistribution which I could call and it would give me back
>> elements of twocombs with the probability as they occur in distro, that
>> is for on average I would get twice as much {c,a}s as {d,a}s and never
>> get {d.c} or {d,d}.
>>
>> How can I craft that ?
>>
>> /Of course I need it for an arbitrary but finite length string lst over
>> a fixed length alphabet {a,b,c,d,....} for k-length elements of kcombs,
>> and it has to be super fast  :).  My real lst is between 30,000 and
>> 70,000 element long over a four element alphabet and I am looking for k
>> between 5 and a few hundred. /
>
> For a 4-element alphabet, kcombs will have 4^k terms.
> If k = "a few hundred", kcombs will be too big.
> Why not just sort and count the k-sequences in the data?
>
> In[1]:= data = Table[Random[Integer,{1,4}],{100}]
>
> Out[1]= {2,4,3,3,3,4,3,2,3,3,1,3,2,2,4,1,4,4,4,1,2,3,3,4,1,
>          2,1,4,1,1,2,2,4,3,3,1,2,4,2,3,4,2,2,2,3,4,3,4,3,2,
>          2,3,3,3,1,3,3,1,3,1,1,1,1,4,2,2,3,4,2,4,3,4,3,1,4,
>          4,3,4,4,1,3,2,1,2,4,2,4,1,1,2,3,2,4,3,1,4,3,4,4,1}
>
> In[2]:= With[{k = 3}, Reverse /@ Reverse@Sort@Map[{Length[#],#[[1]]}&,
>                       Split@Sort[FromDigits/@Partition[data,k,1]]]]
>
> Out[2]= {{434, 4}, {343, 4}, {331, 4}, {243, 4}, {441, 3}, {313, 3},
>          {234, 3}, {233, 3}, {223, 3}, {433, 2}, {432, 2}, {431, 2},
>          {424, 2}, {422, 2}, {412, 2}, {411, 2}, {344, 2}, {342, 2},
>          {334, 2}, {333, 2}, {322, 2}, {314, 2}, {242, 2}, {241, 2},
>          {224, 2}, {144, 2}, {132, 2}, {124, 2}, {123, 2}, {112, 2},
>          {111, 2}, {444, 1}, {443, 1}, {423, 1}, {414, 1}, {413, 1},
>          {341, 1}, {324, 1}, {323, 1}, {321, 1}, {312, 1}, {311, 1},
>          {232, 1}, {222, 1}, {214, 1}, {212, 1}, {143, 1}, {142, 1},
>          {141, 1}, {133, 1}, {131, 1}, {122, 1}, {121, 1}, {114, 1}}

Having read the other replies, I see that I missed your question,
which is how to generate a random observation from the distribution
of k-tuples in the observed data. By far the easiest way is to take
a random k-tuple from the original data:

Take[data,{1,k}+Random[Integer,Length@data-k]]

However, if you really want to use the distribution instead of the data
that gave rise to it then you should look into the "Alias Method" of
generating random observations from an arbitary discrete distribution.

```

• Prev by Date: Re: Sorting (again!), but with multiple columns
• Next by Date: Re: Publicon problems converting sample document to LaTeX
• Previous by thread: Re: Re: newbie is looking for a customDistribution function
• Next by thread: Re: Re: newbie is looking for a customDistribution function