MathGroup Archive: August 2005 [00693]

[Date Index] [Thread Index] [Author Index]

Re: Random sampling of an arbitrary distribution

To: mathgroup at smc.vnet.net
Subject: [mg59925] Re: Random sampling of an arbitrary distribution
From: Bill Rowe <readnewsciv at earthlink.net>
Date: Thu, 25 Aug 2005 06:34:38 -0400 (EDT)
Sender: owner-wri-mathgroup at wolfram.com

On 8/24/05 at 6:30 AM, h.l.owen at dl.ac.uk (Owen, HL (Hywel)) wrote:

>Thanks to Bill, Dennis and ggroup who all pointed out that I should
>have used the inverse CDF to do the sampling, e.g. the method
>quoted in the message below. I should have remembered that from my
>statistics class! ;)

>It is certainly true that for PDFs for which an analytic inverse
>function can be found, the best method is to use e.g. for an
>exponential PDF:

>pdf[x_]=Exp[-x]; lb=0; ub=Infinity;
>cdf[x_]=Integrate[pdf[t],{t,lb,x}, Assumptions->lb<x<ub];
>inversecdf[u_]=x/.Solve[cdf[x]==u,x][[1]];
>somepoints=Table[inversecdf[Random[]], {10000}];

>This works fine and allows me to create samples from -Infinity to
>Infinity.

>For functions that don't have analytic inverse CDFs the choice of
>methods is not so clear to me. Going back to my original test
>function, a modulated Gaussian:

>ModulatedGaussian[x_, mu_, sigma_, modamp_, modlambda_, modphase_]
>:= (1 + modamp*Sin[modphase + (2*Pi*x)/modlambda])/(E^((-mu +
>x)^2/(2*sigma^2))*Sqrt[2*Pi]*sigma)

Are you aware the expression above cannot be a density function? An expression for a denisty function must integrate to 1 over its domain. To see this expression does not satisfy this in general set modphase to 0 and modlambda to 1. Now note, the expression can be written as the sum of the pdf for a normal distribution and a Sin[2 Pi x] times the pdf for a normal distribution. The latter term is the expected value of a Sin[2 Pi x] when x has a normal distribution.

Using mathStatica,

f = (1/(\[Sigma]*Sqrt[2*Pi]))*Exp[-((x - \[Mu])^2/
       (2*\[Sigma]^2))]; 
  domain[f] = {x, -Infinity, Infinity} && 
    {\[Mu] \[Element] Reals, \[Sigma] > 0}; 

Expect[a*Sin[2*Pi*x], f]

(a*Sin[2*Pi*\[Mu]])/ E^(2*Pi^2*\[Sigma]^2)

That is the integral over the domain does not evaluate to 1.

Hence the usual methods for generating pseudo random numbers cannot be relied on in this instance.

>My original method is:

>MonteCarloSample[distfn_, distmin_, distmax_] := Module[{a, b},
>While[(a = Random[Real, {distmin, distmax}]; b = Random[]; b >
>distfn[a])]; a]

>MonteCarloEnsemble[numberofsamples_, 
>distfn_, distmin_, distmax_] := Table[MonteCarloSample[distfn,
>distmin, 
>distmax], {numberofsamples}]

>pts1 = MonteCarloEnsemble[10^4, ModulatedGaussian[#, 0.0, 1.0, 
>1.0, 0.6, 0] &, -5, 5];

Normally, when someone speaks of getting random numbers from a specified density function they mean the the probability of getting a number less than a specified value is equal to the cumulative distribution function evaluated at that specified value. What you are doing above is distinctly different. You are selecting random points conditioned on having the value of a expression (intended to be the density function?) at that point less than the random another random point. This is not equivalent to generating random numbers with a specified density function.

For example, suppose your density function was a truncated exponential, i.e., Exp[-x]/(Exp[-a]-Exp[-b]) with domain (100, 10^6). Since Exp[-100] is much much less than 1, the output of your function MonteCarloSample will be approximately the same as that from Random[Real, {100,10^6}] i.e., approximately uniform from 100 to 10^6, definitely not from a truncated exponential.

If all you need is to sample an expression at random points over its domain then simply doing

f[Random[Real,{a,b}] will do when a and b are finite. If one is infinite, use a function that maps the interval (0,1) to the say (a,Infinity) and use this as the point to sample your expression.

For example, -Log[Random[]] will generate a random number between 0 and Infinity.

Of course, the distribution of these samples will not be uniform. If you need the samples to be uniform (as in coming from a uniform distribution function) then make a = 0 and b = to the largest machine number possible on your system. It isn't possible to get closer to (0,Infinity) with machine precision.

OTOH, if you do want random numbers from a specified density function it is possible to achieve this when the cumulative distribution function cannot be easily evaluated. But rather than attempt to explain the possibilities here I stongly suggest you consult established texts. An excellent starting point is Knuth's Seminumerical Algorithms Vol 2. Also, Mathematical Statistics by Colin Rose and Murray Smith as a discussion regarding generating random numbers. This text comes with a Mathematica package (mathStatica) that significantly extends Mathematica's ability to solve statistical problems. Methods for generating random numbers from distributions where the cumulative distribution function cannot be explicity computed are discussed in both texts.

A couple of final thoughts. Creating a psuedo random number generator with the desired properties is anything but a trivial task. It is extremely easy to create a very bad generator. Bad meaning the numbers do generated have few if any of the desired properties. Knuth discusses several examples of bad generators.

Also, verifying a particular generator has the desired properties is very time consuming.

--
To reply via email subtract one hundred and four
--
To reply via email subtract one hundred and four

Prev by Date: my wish list for Mathematica next major version

Next by Date: Re: An Array of Equation Parameters

Previous by thread: Re: Re: Random sampling of an arbitrary distribution

Next by thread: Re: Re: Random sampling of an arbitrary distribution