MathGroup Archive: September 2004 [00037]

[Date Index] [Thread Index] [Author Index]

Re: newbie is looking for a customDistribution function

To: mathgroup at smc.vnet.net
Subject: [mg50418] Re: newbie is looking for a customDistribution function
From: koopman at sfu.ca (Ray Koopman)
Date: Thu, 2 Sep 2004 04:35:10 -0400 (EDT)
References: <ch3o86$t96$1@smc.vnet.net>
Sender: owner-wri-mathgroup at wolfram.com

János <janos.lobb at yale.edu> wrote in message news:<ch3o86$t96$1 at smc.vnet.net>...
> I looked for it in the archives, but found none.  I am looking for ways  
> to create a custom distribution, which I can call as a function.  Here  
> is an example for illustration.  Let's say I have a list created from a  
> 4 elements alphabet  {a,b,c,d}:
> 
> In[1]:= lst={a,a,b,c,a,d,a,c,c,a}
> 
> Out[1]= {a,a,b,c,a,d,a,c,c,a}
> 
> Distribute gives me - thanks David Park - all the two element  
> combinations of {a,b,c,d}
> 
> In[11]:= twocombs=Distribute[Table[{a,b,c,d},{2}],List]
> 
> Out[11]= {{a,a},{a,b},{a,c},{a,d},{b,a},{b,b},{b,c},{b,d},
>           {c,a},{c,b},{c,c},{c,d},{d,a},{d,b},{d,c},{d,d}}
> 
> I can count the occurrence of an element of twocombs in lst with the  
> following function:
> 
> occuranceCount[x_List] := Count[Partition[lst, 2, 1], x]
> 
> Mapping this function over twocombs gives me the number of occurances  
> of elements of twocombs in lst:
> 
> In[12]:= distro=Map[occuranceCount,twocombs]
> 
> Out[12]= {1,1,1,1,0,0,1,0,2,0,1,0,1,0,0,0}
> 
> It shows that for example {c,a} occurs twice, {d,a} occurs once and  
> {d,c} or {d,d} never occur.
> 
> Now, I would like to create a distribution function called  
> twocombsLstDistribution which I could call and it would give me back  
> elements of twocombs with the probability as they occur in distro, that  
> is for on average I would get twice as much {c,a}s as {d,a}s and never  
> get {d.c} or {d,d}.
> 
> How can I craft that ?
> 
> /Of course I need it for an arbitrary but finite length string lst over  
> a fixed length alphabet {a,b,c,d,....} for k-length elements of kcombs,  
> and it has to be super fast  :).  My real lst is between 30,000 and  
> 70,000 element long over a four element alphabet and I am looking for k  
> between 5 and a few hundred. /

For a 4-element alphabet, kcombs will have 4^k terms. 
If k = "a few hundred", kcombs will be too big.
Why not just sort and count the k-sequences in the data?

In[1]:= data = Table[Random[Integer,{1,4}],{100}]

Out[1]= {2,4,3,3,3,4,3,2,3,3,1,3,2,2,4,1,4,4,4,1,2,3,3,4,1,
         2,1,4,1,1,2,2,4,3,3,1,2,4,2,3,4,2,2,2,3,4,3,4,3,2,
         2,3,3,3,1,3,3,1,3,1,1,1,1,4,2,2,3,4,2,4,3,4,3,1,4,
         4,3,4,4,1,3,2,1,2,4,2,4,1,1,2,3,2,4,3,1,4,3,4,4,1}

In[2]:= With[{k = 3}, Reverse /@ Reverse@Sort@Map[{Length[#],#[[1]]}&, 
                      Split@Sort[FromDigits/@Partition[data,k,1]]]]

Out[2]= {{434, 4}, {343, 4}, {331, 4}, {243, 4}, {441, 3}, {313, 3},
         {234, 3}, {233, 3}, {223, 3}, {433, 2}, {432, 2}, {431, 2},
         {424, 2}, {422, 2}, {412, 2}, {411, 2}, {344, 2}, {342, 2},
         {334, 2}, {333, 2}, {322, 2}, {314, 2}, {242, 2}, {241, 2},
         {224, 2}, {144, 2}, {132, 2}, {124, 2}, {123, 2}, {112, 2},
         {111, 2}, {444, 1}, {443, 1}, {423, 1}, {414, 1}, {413, 1},
         {341, 1}, {324, 1}, {323, 1}, {321, 1}, {312, 1}, {311, 1},
         {232, 1}, {222, 1}, {214, 1}, {212, 1}, {143, 1}, {142, 1},
         {141, 1}, {133, 1}, {131, 1}, {122, 1}, {121, 1}, {114, 1}}

Prev by Date: Re: newbie is looking for a customDistribution function

Next by Date: Re: newbie is looking for a customDistribution function

Previous by thread: Re: newbie is looking for a customDistribution function

Next by thread: Re: newbie is looking for a customDistribution function