Re: newbie is looking for a customDistribution function

*To*: mathgroup at smc.vnet.net*Subject*: [mg50435] Re: newbie is looking for a customDistribution function*From*: koopman at sfu.ca (Ray Koopman)*Date*: Fri, 3 Sep 2004 03:36:14 -0400 (EDT)*References*: <ch3o86$t96$1@smc.vnet.net> <ch6nlk$2d5$1@smc.vnet.net>*Sender*: owner-wri-mathgroup at wolfram.com

koopman at sfu.ca (Ray Koopman) wrote in message news:<ch6nlk$2d5$1 at smc.vnet.net>... > János <janos.lobb at yale.edu> wrote in message news:<ch3o86$t96$1 at smc.vnet.net>... > [...] >> Now, I would like to create a distribution function called >> twocombsLstDistribution which I could call and it would give me back >> elements of twocombs with the probability as they occur in distro, that >> is for on average I would get twice as much {c,a}s as {d,a}s and never >> get {d.c} or {d,d}. >> >> How can I craft that ? >> >> /Of course I need it for an arbitrary but finite length string lst over >> a fixed length alphabet {a,b,c,d,....} for k-length elements of kcombs, >> and it has to be super fast :). My real lst is between 30,000 and >> 70,000 element long over a four element alphabet and I am looking for k >> between 5 and a few hundred. / > > For a 4-element alphabet, kcombs will have 4^k terms. > If k = "a few hundred", kcombs will be too big. > Why not just sort and count the k-sequences in the data? > > In[1]:= data = Table[Random[Integer,{1,4}],{100}] > > Out[1]= {2,4,3,3,3,4,3,2,3,3,1,3,2,2,4,1,4,4,4,1,2,3,3,4,1, > 2,1,4,1,1,2,2,4,3,3,1,2,4,2,3,4,2,2,2,3,4,3,4,3,2, > 2,3,3,3,1,3,3,1,3,1,1,1,1,4,2,2,3,4,2,4,3,4,3,1,4, > 4,3,4,4,1,3,2,1,2,4,2,4,1,1,2,3,2,4,3,1,4,3,4,4,1} > > In[2]:= With[{k = 3}, Reverse /@ Reverse@Sort@Map[{Length[#],#[[1]]}&, > Split@Sort[FromDigits/@Partition[data,k,1]]]] > > Out[2]= {{434, 4}, {343, 4}, {331, 4}, {243, 4}, {441, 3}, {313, 3}, > {234, 3}, {233, 3}, {223, 3}, {433, 2}, {432, 2}, {431, 2}, > {424, 2}, {422, 2}, {412, 2}, {411, 2}, {344, 2}, {342, 2}, > {334, 2}, {333, 2}, {322, 2}, {314, 2}, {242, 2}, {241, 2}, > {224, 2}, {144, 2}, {132, 2}, {124, 2}, {123, 2}, {112, 2}, > {111, 2}, {444, 1}, {443, 1}, {423, 1}, {414, 1}, {413, 1}, > {341, 1}, {324, 1}, {323, 1}, {321, 1}, {312, 1}, {311, 1}, > {232, 1}, {222, 1}, {214, 1}, {212, 1}, {143, 1}, {142, 1}, > {141, 1}, {133, 1}, {131, 1}, {122, 1}, {121, 1}, {114, 1}} Having read the other replies, I see that I missed your question, which is how to generate a random observation from the distribution of k-tuples in the observed data. By far the easiest way is to take a random k-tuple from the original data: Take[data,{1,k}+Random[Integer,Length@data-k]] However, if you really want to use the distribution instead of the data that gave rise to it then you should look into the "Alias Method" of generating random observations from an arbitary discrete distribution.

**Follow-Ups**:**Re: Re: newbie is looking for a customDistribution function***From:*DrBob <drbob@bigfoot.com>

**Re: Sorting (again!), but with multiple columns**

**Re: Publicon problems converting sample document to LaTeX**

**Re: Re: newbie is looking for a customDistribution function**

**Re: Re: newbie is looking for a customDistribution function**