MathGroup Archive: September 2004 [00041]

[Date Index] [Thread Index] [Author Index]

Re: Re: newbie is looking for a customDistribution function

To: mathgroup at smc.vnet.net
Subject: [mg50424] Re: [mg50402] Re: newbie is looking for a customDistribution function
From: János <janos.lobb at yale.edu>
Date: Fri, 3 Sep 2004 03:35:10 -0400 (EDT)
References: <ch3o86$t96$1@smc.vnet.net> <200409020834.EAA02053@smc.vnet.net>
Sender: owner-wri-mathgroup at wolfram.com

Thank you all for your reply - on and off list.   I gained a good  
amount of information by looking at the referenced material.  It is  
amazing to see how much knowledge has been accumulated via this list.   
I will try all suggestions on the weekend.

Thanks again,

János
On Sep 2, 2004, at 4:34 AM, Paul Abbott wrote:

> In article <ch3o86$t96$1 at smc.vnet.net>, János <janos.lobb at yale.edu>
> wrote:
>
>> I looked for it in the archives, but found none.
>
> It is there at
>
>   http://groups.google.com/groups?threadm=6d0ch7%242no%40smc.vnet.net
>
> Also see The Mathematica Journal 1(3): 57, which is referenced at this
> link. Further comments are given below.
>
>> I am looking for ways
>> to create a custom distribution, which I can call as a function.  Here
>> is an example for illustration.  Let's say I have a list created from  
>> a
>> 4 elements alphabet  {a,b,c,d}:
>>
>> In[1]:=
>> lst={a,a,b,c,a,d,a,c,c,a}
>>
>> Out[1]=
>> {a,a,b,c,a,d,a,c,c,a}
>>
>> Distribute gives me - thanks David Park - all the two element
>> combinations of {a,b,c,d}
>>
>> In[11]:=
>> twocombs=Distribute[Table[{a,b,c,d},{2}],List]
>>
>> Out[11]=
>> {{a,a},{a,b},{a,c},{a,d},{b,a},{b,b},{b,c},{b,d},{c,a},{c,b},{c,c},{c, 
>> d}
>> ,{
>>    d,a},{d,b},{d,c},{d,d}}
>>
>> I can count the occurrence of an element of twocombs in lst with the
>> following function:
>>
>> occuranceCount[x_List] := Count[Partition[lst, 2, 1], x]
>>
>> Mapping this function over twocombs gives me the number of occurances
>> of elements of twocombs in lst:
>>
>> In[12]:=
>> distro=Map[occuranceCount,twocombs]
>>
>> Out[12]=
>> {1,1,1,1,0,0,1,0,2,0,1,0,1,0,0,0}
>>
>> It shows that for example {c,a} occurs twice, {d,a} occurs once and
>> {d,c} or {d,d} never occur.
>>
>> Now, I would like to create a distribution function called
>> twocombsLstDistribution which I could call and it would give me back
>> elements of twocombs with the probability as they occur in distro,  
>> that
>> is for on average I would get twice as much {c,a}s as {d,a}s and never
>> get {d.c} or {d,d}.
>>
>> How can I craft that ?
>
> The idea of the code below is to count for how many symbols the
> cumulative frequencies
>
>   cumfreq[x_List] := FoldList[Plus, First[x], Rest[x]]/Tr[x];
>
> are less than a fixed random number t in the range [0,1], and use the
> number of hits as the index into the alphabet.
>
>   index[f_, r_] := Length[Select[f, r >= #1 & ]] + 1;
>
>   rand[x_List, cf_List] := x[[index[cf, Random[]]]]
>
> For your distribution,
>
>   cf = cumfreq[distro]
>
> here is a randome set of elements in twocombs with the probability as
> they occur in distro.
>
>  Table[rand[twocombs, cf], {2000}];
>
> As a check we see that
>
>  Count[%, #] & /@ twocombs
>
> looks fine.
>
>> /Of course I need it for an arbitrary but finite length string lst  
>> over
>> a fixed length alphabet {a,b,c,d,....} for k-length elements of  
>> kcombs,
>> and it has to be super fast  :).  My real lst is between 30,000 and
>> 70,000 element long over a four element alphabet and I am looking for  
>> k
>> between 5 and a few hundred. /
>
> Indexing using zeroth-order Interpolation is considerably faster (See
> e.g.,  
> http://groups.google.com/groups?selm=b34q2o%24gc1%241%40smc.vnet.net):
>
>   int[distro_] := int[distro] = Interpolation[Transpose[
>     {
>      Range[0, 1, 1/Tr[distro]],
>      Join[{1}, Flatten[MapIndexed[Table[First[#2], {#1}] & , distro]]]
>     }
>   ], InterpolationOrder -> 0]
>
> If you compare
>
>   SeedRandom[1];
>   Timing[test1 = Table[rand[twocombs, cf], {100000}];]
>
> to
>
>   SeedRandom[1];
>   Timing[test2 =Table[twocombs[[int[distro][Random[]]]],{100000}];]
>
> you should find that test1 == test2 and that using int[distro] is about
> 4 times faster.
>
> Cheers,
> Paul
>
> -- 
> Paul Abbott                                   Phone: +61 8 9380 2734
> School of Physics, M013                         Fax: +61 8 9380 1014
> The University of Western Australia      (CRICOS Provider No 00126G)
> 35 Stirling Highway
> Crawley WA 6009                      mailto:paul at physics.uwa.edu.au
> AUSTRALIA                            http://physics.uwa.edu.au/~paul
>
>

-------------------------------------------------------------------
János Löbb
Yale University School of Medicine
Department of Pathology
Phone: 203-737-5204
Fax:      203-785-7303
E-mail: janos.lobb at yale.edu

References:
- Re: newbie is looking for a customDistribution function
  - From: Paul Abbott <paul@physics.uwa.edu.au>

Prev by Date: Re: Inflight magazine puzzle

Next by Date: Cases

Previous by thread: Re: newbie is looking for a customDistribution function

Next by thread: Re: newbie is looking for a customDistribution function