Re: Re: newbie is looking for a customDistribution function
- To: mathgroup at smc.vnet.net
- Subject: [mg50424] Re: [mg50402] Re: newbie is looking for a customDistribution function
- From: János <janos.lobb at yale.edu>
- Date: Fri, 3 Sep 2004 03:35:10 -0400 (EDT)
- References: <ch3o86$t96$1@smc.vnet.net> <200409020834.EAA02053@smc.vnet.net>
- Sender: owner-wri-mathgroup at wolfram.com
Thank you all for your reply - on and off list. I gained a good amount of information by looking at the referenced material. It is amazing to see how much knowledge has been accumulated via this list. I will try all suggestions on the weekend. Thanks again, János On Sep 2, 2004, at 4:34 AM, Paul Abbott wrote: > In article <ch3o86$t96$1 at smc.vnet.net>, János <janos.lobb at yale.edu> > wrote: > >> I looked for it in the archives, but found none. > > It is there at > > http://groups.google.com/groups?threadm=6d0ch7%242no%40smc.vnet.net > > Also see The Mathematica Journal 1(3): 57, which is referenced at this > link. Further comments are given below. > >> I am looking for ways >> to create a custom distribution, which I can call as a function. Here >> is an example for illustration. Let's say I have a list created from >> a >> 4 elements alphabet {a,b,c,d}: >> >> In[1]:= >> lst={a,a,b,c,a,d,a,c,c,a} >> >> Out[1]= >> {a,a,b,c,a,d,a,c,c,a} >> >> Distribute gives me - thanks David Park - all the two element >> combinations of {a,b,c,d} >> >> In[11]:= >> twocombs=Distribute[Table[{a,b,c,d},{2}],List] >> >> Out[11]= >> {{a,a},{a,b},{a,c},{a,d},{b,a},{b,b},{b,c},{b,d},{c,a},{c,b},{c,c},{c, >> d} >> ,{ >> d,a},{d,b},{d,c},{d,d}} >> >> I can count the occurrence of an element of twocombs in lst with the >> following function: >> >> occuranceCount[x_List] := Count[Partition[lst, 2, 1], x] >> >> Mapping this function over twocombs gives me the number of occurances >> of elements of twocombs in lst: >> >> In[12]:= >> distro=Map[occuranceCount,twocombs] >> >> Out[12]= >> {1,1,1,1,0,0,1,0,2,0,1,0,1,0,0,0} >> >> It shows that for example {c,a} occurs twice, {d,a} occurs once and >> {d,c} or {d,d} never occur. >> >> Now, I would like to create a distribution function called >> twocombsLstDistribution which I could call and it would give me back >> elements of twocombs with the probability as they occur in distro, >> that >> is for on average I would get twice as much {c,a}s as {d,a}s and never >> get {d.c} or {d,d}. >> >> How can I craft that ? > > The idea of the code below is to count for how many symbols the > cumulative frequencies > > cumfreq[x_List] := FoldList[Plus, First[x], Rest[x]]/Tr[x]; > > are less than a fixed random number t in the range [0,1], and use the > number of hits as the index into the alphabet. > > index[f_, r_] := Length[Select[f, r >= #1 & ]] + 1; > > rand[x_List, cf_List] := x[[index[cf, Random[]]]] > > For your distribution, > > cf = cumfreq[distro] > > here is a randome set of elements in twocombs with the probability as > they occur in distro. > > Table[rand[twocombs, cf], {2000}]; > > As a check we see that > > Count[%, #] & /@ twocombs > > looks fine. > >> /Of course I need it for an arbitrary but finite length string lst >> over >> a fixed length alphabet {a,b,c,d,....} for k-length elements of >> kcombs, >> and it has to be super fast :). My real lst is between 30,000 and >> 70,000 element long over a four element alphabet and I am looking for >> k >> between 5 and a few hundred. / > > Indexing using zeroth-order Interpolation is considerably faster (See > e.g., > http://groups.google.com/groups?selm=b34q2o%24gc1%241%40smc.vnet.net): > > int[distro_] := int[distro] = Interpolation[Transpose[ > { > Range[0, 1, 1/Tr[distro]], > Join[{1}, Flatten[MapIndexed[Table[First[#2], {#1}] & , distro]]] > } > ], InterpolationOrder -> 0] > > If you compare > > SeedRandom[1]; > Timing[test1 = Table[rand[twocombs, cf], {100000}];] > > to > > SeedRandom[1]; > Timing[test2 =Table[twocombs[[int[distro][Random[]]]],{100000}];] > > you should find that test1 == test2 and that using int[distro] is about > 4 times faster. > > Cheers, > Paul > > -- > Paul Abbott Phone: +61 8 9380 2734 > School of Physics, M013 Fax: +61 8 9380 1014 > The University of Western Australia (CRICOS Provider No 00126G) > 35 Stirling Highway > Crawley WA 6009 mailto:paul at physics.uwa.edu.au > AUSTRALIA http://physics.uwa.edu.au/~paul > > ------------------------------------------------------------------- János Löbb Yale University School of Medicine Department of Pathology Phone: 203-737-5204 Fax: 203-785-7303 E-mail: janos.lobb at yale.edu
- References:
- Re: newbie is looking for a customDistribution function
- From: Paul Abbott <paul@physics.uwa.edu.au>
- Re: newbie is looking for a customDistribution function