Re: and sampling a distribution
- To: mathgroup at smc.vnet.net
- Subject: [mg110566] Re: and sampling a distribution
- From: Bill Rowe <readnews at sbcglobal.net>
- Date: Sat, 26 Jun 2010 03:09:18 -0400 (EDT)
On 6/25/10 at 7:27 AM, stone at geology.washington.edu (John Stone)
wrote:
>I am trying to use RandomReal[ ] to sample from bins of different
>widths that span the interval 0 - 1. The bin widths represent the
>weights I'm assigning to a family of trial solutions in an
>optimization problem. The aim is to sample the solutions in
>proportion to their weights using a uniform distribution of random
>numbers generated by RandomReal[ ].
>For a simple example, however, suppose there are 10 equally weighted
>solutions. My selection process would use some code that looks
>like:
>weights = Table[0.1, {10}];
>bins = Accumulate[weights];
>Select[bins, (# >= RandomReal[] &)][[1]]
Rather than RandomReal you should be using RandomChoice. Specifically,
RandomChoice[weights->bins,10]
will return a list of 10 values with the desired distribution.
This can be seen by doing:
Histogram[RandomChoice[weights -> bins, 1000]]
and note with equal weights and equally spaced bins of size 0.1,
the following is equivalent
RandomInteger[{1,10}]/10//N
>Assuming the result of RandomReal[ ] is uniformly distributed, I
>expected this to return 0.1 as frequently as it returns 0.5 or 1,
No, this isn't correct. The value 0.1 will be returned whenever
RandomReal returns a value greater than or equal to 0.1 but less
than 0.2 which should happen 10% of the time. But the value 1
will be returned only if RandomReal returns the value 1 which
will happens with probability near 0.
Now consider what happens when RandomReal returns a value
greater than 0.1 but less than 0.3. This will occur ~20% of the
time. And your selection criteria will return 0.2 as the first
value in the list of selected values. That is 0.1 occurs with
probability 10%, 0.2 occurs with probability 20% an 1 occurs
with very low probability (near 0).
So, it is clear the distribution with this selection criteria
cannot be flat as you were expecting.
I haven't worked out the probability for the other values in the
list. I think the above is sufficient to show the selection
criteria you have used will not return uniform deviates.