Re: RandomReal[] and sampling a distribution
- To: mathgroup at smc.vnet.net
- Subject: [mg110574] Re: RandomReal[] and sampling a distribution
- From: Peter Pein <petsie at dordos.net>
- Date: Sat, 26 Jun 2010 03:10:46 -0400 (EDT)
- References: <i023oo$r8a$1@smc.vnet.net>
Am Fri, 25 Jun 2010 11:26:16 +0000 (UTC) schrieb John Stone <stone at geology.washington.edu>: > I hope this is a simple question (perhaps too simple for the > archives, which I did search). Thanks in advance for any help. > > I am trying to use RandomReal[ ] to sample from bins of different > widths that span the interval 0 - 1. The bin widths represent the > weights I'm assigning to a family of trial solutions in an > optimization problem. The aim is to sample the solutions in > proportion to their weights using a uniform distribution of random > numbers generated by RandomReal[ ]. > > For a simple example, however, suppose there are 10 equally weighted > solutions. My selection process would use some code that looks like: > > weights = Table[0.1, {10}]; > bins = Accumulate[weights]; > Select[bins, (# >= RandomReal[] &)][[1]] > > Assuming the result of RandomReal[ ] is uniformly distributed, I > expected this to return 0.1 as frequently as it returns 0.5 or 1, but > it seems to return a distribution of values peaked around 0.5 (and > seldom returns 0.9 or 1). > > Graphically, the following I think is equivalent, and gives me a > peaked, not a flat distribution: > > Histogram[ > Table[ > Select[{0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1}, > (# >= RandomReal[ ] &)][[1]], > {1000}] > ] > > Shouldn't the procedure in my example generate a flat distribution? > Or am I missing something really simple? > > Thanks again if you can help me solve this. it is the call to Select[] which has a not too obvious trap: Select takes the list bins and compares the first element (0.1) with a random number (say 0.123). So 0.1 will not be in the result of Select[]. Next it compares 0.2 from the bins-list with _another_ random number (say 0.3141). =.2 should have been in the result if the random value were the same. If you want to do it this way (Select[] is slow, compared to the following suggestion), try: (tmp = RandomReal[]; Select[bins, (# >= tmp &)][[1]]). A simple test shows that this method needs ~6 seconds on my laptop for 10^5 samples: With[{samples=100000}, BinCounts[Table[tmp=RandomReal[];Select[bins,(#>=tmp&)][[1]],{samples}],{bins}]/samples]//N//Timing {6.59,{0.10073,0.10007,0.09997,0.09944,0.09995,0.10008,0.09964,0.09939,0.10052}} Using built in RandomChoice[] is about 3 times faster and has less program text: With[{samples=100000}, BinCounts[RandomChoice[weights->bins,samples],{bins}]/samples]//N//Timing {2.1,{0.1009,0.10192,0.09936,0.10089,0.09974,0.09733,0.09881,0.09986,0.10099}} Hope that helps, Peter