MathGroup Archive: September 2004 [00039]

[Date Index] [Thread Index] [Author Index]

Re: newbie is looking for a customDistribution function

To: mathgroup at smc.vnet.net
Subject: [mg50402] Re: newbie is looking for a customDistribution function
From: Paul Abbott <paul at physics.uwa.edu.au>
Date: Thu, 2 Sep 2004 04:34:27 -0400 (EDT)
Organization: The University of Western Australia
References: <ch3o86$t96$1@smc.vnet.net>
Sender: owner-wri-mathgroup at wolfram.com

In article <ch3o86$t96$1 at smc.vnet.net>, János <janos.lobb at yale.edu> 
wrote:

> I looked for it in the archives, but found none. 

It is there at

  http://groups.google.com/groups?threadm=6d0ch7%242no%40smc.vnet.net

Also see The Mathematica Journal 1(3): 57, which is referenced at this 
link. Further comments are given below.

> I am looking for ways  
> to create a custom distribution, which I can call as a function.  Here  
> is an example for illustration.  Let's say I have a list created from a  
> 4 elements alphabet  {a,b,c,d}:
> 
> In[1]:=
> lst={a,a,b,c,a,d,a,c,c,a}
> 
> Out[1]=
> {a,a,b,c,a,d,a,c,c,a}
> 
> Distribute gives me - thanks David Park - all the two element  
> combinations of {a,b,c,d}
> 
> In[11]:=
> twocombs=Distribute[Table[{a,b,c,d},{2}],List]
> 
> Out[11]=
> {{a,a},{a,b},{a,c},{a,d},{b,a},{b,b},{b,c},{b,d},{c,a},{c,b},{c,c},{c,d} 
> ,{
>    d,a},{d,b},{d,c},{d,d}}
> 
> I can count the occurrence of an element of twocombs in lst with the  
> following function:
> 
> occuranceCount[x_List] := Count[Partition[lst, 2, 1], x]
> 
> Mapping this function over twocombs gives me the number of occurances  
> of elements of twocombs in lst:
> 
> In[12]:=
> distro=Map[occuranceCount,twocombs]
> 
> Out[12]=
> {1,1,1,1,0,0,1,0,2,0,1,0,1,0,0,0}
> 
> It shows that for example {c,a} occurs twice, {d,a} occurs once and  
> {d,c} or {d,d} never occur.
> 
> Now, I would like to create a distribution function called  
> twocombsLstDistribution which I could call and it would give me back  
> elements of twocombs with the probability as they occur in distro, that  
> is for on average I would get twice as much {c,a}s as {d,a}s and never  
> get {d.c} or {d,d}.
> 
> How can I craft that ?

The idea of the code below is to count for how many symbols the 
cumulative frequencies

  cumfreq[x_List] := FoldList[Plus, First[x], Rest[x]]/Tr[x]; 

are less than a fixed random number t in the range [0,1], and use the 
number of hits as the index into the alphabet.

  index[f_, r_] := Length[Select[f, r >= #1 & ]] + 1; 

  rand[x_List, cf_List] := x[[index[cf, Random[]]]]

For your distribution,

  cf = cumfreq[distro]

here is a randome set of elements in twocombs with the probability as 
they occur in distro.

 Table[rand[twocombs, cf], {2000}]; 

As a check we see that

 Count[%, #] & /@ twocombs

looks fine.

> /Of course I need it for an arbitrary but finite length string lst over  
> a fixed length alphabet {a,b,c,d,....} for k-length elements of kcombs,  
> and it has to be super fast  :).  My real lst is between 30,000 and  
> 70,000 element long over a four element alphabet and I am looking for k  
> between 5 and a few hundred. /

Indexing using zeroth-order Interpolation is considerably faster (See 
e.g., http://groups.google.com/groups?selm=b34q2o%24gc1%241%40smc.vnet.net):

  int[distro_] := int[distro] = Interpolation[Transpose[
    {
     Range[0, 1, 1/Tr[distro]],      
     Join[{1}, Flatten[MapIndexed[Table[First[#2], {#1}] & , distro]]]
    }
  ], InterpolationOrder -> 0]

If you compare

  SeedRandom[1]; 
  Timing[test1 = Table[rand[twocombs, cf], {100000}];]

to

  SeedRandom[1]; 
  Timing[test2 =Table[twocombs[[int[distro][Random[]]]],{100000}];]

you should find that test1 == test2 and that using int[distro] is about 
4 times faster.

Cheers,
Paul

-- 
Paul Abbott                                   Phone: +61 8 9380 2734
School of Physics, M013                         Fax: +61 8 9380 1014
The University of Western Australia      (CRICOS Provider No 00126G)         
35 Stirling Highway
Crawley WA 6009                      mailto:paul at physics.uwa.edu.au 
AUSTRALIA                            http://physics.uwa.edu.au/~paul

Follow-Ups:
- Re: Re: newbie is looking for a customDistribution function
  - From: János <janos.lobb@yale.edu>

Prev by Date: Re: newbie is looking for a customDistribution function

Next by Date: Re: Inflight magazine puzzle

Previous by thread: Re: newbie is looking for a customDistribution function

Next by thread: Re: Re: newbie is looking for a customDistribution function