Re: newbie is looking for a customDistribution function
- To: mathgroup at smc.vnet.net
- Subject: [mg50402] Re: newbie is looking for a customDistribution function
- From: Paul Abbott <paul at physics.uwa.edu.au>
- Date: Thu, 2 Sep 2004 04:34:27 -0400 (EDT)
- Organization: The University of Western Australia
- References: <ch3o86$t96$1@smc.vnet.net>
- Sender: owner-wri-mathgroup at wolfram.com
In article <ch3o86$t96$1 at smc.vnet.net>, János <janos.lobb at yale.edu> wrote: > I looked for it in the archives, but found none. It is there at http://groups.google.com/groups?threadm=6d0ch7%242no%40smc.vnet.net Also see The Mathematica Journal 1(3): 57, which is referenced at this link. Further comments are given below. > I am looking for ways > to create a custom distribution, which I can call as a function. Here > is an example for illustration. Let's say I have a list created from a > 4 elements alphabet {a,b,c,d}: > > In[1]:= > lst={a,a,b,c,a,d,a,c,c,a} > > Out[1]= > {a,a,b,c,a,d,a,c,c,a} > > Distribute gives me - thanks David Park - all the two element > combinations of {a,b,c,d} > > In[11]:= > twocombs=Distribute[Table[{a,b,c,d},{2}],List] > > Out[11]= > {{a,a},{a,b},{a,c},{a,d},{b,a},{b,b},{b,c},{b,d},{c,a},{c,b},{c,c},{c,d} > ,{ > d,a},{d,b},{d,c},{d,d}} > > I can count the occurrence of an element of twocombs in lst with the > following function: > > occuranceCount[x_List] := Count[Partition[lst, 2, 1], x] > > Mapping this function over twocombs gives me the number of occurances > of elements of twocombs in lst: > > In[12]:= > distro=Map[occuranceCount,twocombs] > > Out[12]= > {1,1,1,1,0,0,1,0,2,0,1,0,1,0,0,0} > > It shows that for example {c,a} occurs twice, {d,a} occurs once and > {d,c} or {d,d} never occur. > > Now, I would like to create a distribution function called > twocombsLstDistribution which I could call and it would give me back > elements of twocombs with the probability as they occur in distro, that > is for on average I would get twice as much {c,a}s as {d,a}s and never > get {d.c} or {d,d}. > > How can I craft that ? The idea of the code below is to count for how many symbols the cumulative frequencies cumfreq[x_List] := FoldList[Plus, First[x], Rest[x]]/Tr[x]; are less than a fixed random number t in the range [0,1], and use the number of hits as the index into the alphabet. index[f_, r_] := Length[Select[f, r >= #1 & ]] + 1; rand[x_List, cf_List] := x[[index[cf, Random[]]]] For your distribution, cf = cumfreq[distro] here is a randome set of elements in twocombs with the probability as they occur in distro. Table[rand[twocombs, cf], {2000}]; As a check we see that Count[%, #] & /@ twocombs looks fine. > /Of course I need it for an arbitrary but finite length string lst over > a fixed length alphabet {a,b,c,d,....} for k-length elements of kcombs, > and it has to be super fast :). My real lst is between 30,000 and > 70,000 element long over a four element alphabet and I am looking for k > between 5 and a few hundred. / Indexing using zeroth-order Interpolation is considerably faster (See e.g., http://groups.google.com/groups?selm=b34q2o%24gc1%241%40smc.vnet.net): int[distro_] := int[distro] = Interpolation[Transpose[ { Range[0, 1, 1/Tr[distro]], Join[{1}, Flatten[MapIndexed[Table[First[#2], {#1}] & , distro]]] } ], InterpolationOrder -> 0] If you compare SeedRandom[1]; Timing[test1 = Table[rand[twocombs, cf], {100000}];] to SeedRandom[1]; Timing[test2 =Table[twocombs[[int[distro][Random[]]]],{100000}];] you should find that test1 == test2 and that using int[distro] is about 4 times faster. Cheers, Paul -- Paul Abbott Phone: +61 8 9380 2734 School of Physics, M013 Fax: +61 8 9380 1014 The University of Western Australia (CRICOS Provider No 00126G) 35 Stirling Highway Crawley WA 6009 mailto:paul at physics.uwa.edu.au AUSTRALIA http://physics.uwa.edu.au/~paul
- Follow-Ups:
- Re: Re: newbie is looking for a customDistribution function
- From: János <janos.lobb@yale.edu>
- Re: Re: newbie is looking for a customDistribution function