MathGroup Archive: August 2005 [00656]

[Date Index] [Thread Index] [Author Index]

Re: Re: Random sampling of an arbitrary distribution

To: mathgroup at smc.vnet.net
Subject: [mg59856] Re: [mg59792] Re: Random sampling of an arbitrary distribution
From: "Owen, HL \(Hywel\)" <h.l.owen at dl.ac.uk>
Date: Wed, 24 Aug 2005 06:30:24 -0400 (EDT)
Sender: owner-wri-mathgroup at wolfram.com

Thanks to Bill, Dennis and ggroup who all pointed out that I should have
used the inverse CDF to do the sampling, e.g. the method quoted in the
message below. I should have remembered that from my statistics class!
;)

It is certainly true that for PDFs for which an analytic inverse
function can be found, the best method is to use e.g. for an exponential
PDF:

pdf[x_]=Exp[-x];
lb=0; ub=Infinity;
cdf[x_]=Integrate[pdf[t],{t,lb,x}, Assumptions->lb<x<ub];
inversecdf[u_]=x/.Solve[cdf[x]==u,x][[1]];
somepoints=Table[inversecdf[Random[]], {10000}];

This works fine and allows me to create samples from -Infinity to
Infinity.

For functions that don't have analytic inverse CDFs the choice of
methods is not so clear to me. Going back to my original test function,
a modulated Gaussian:

ModulatedGaussian[x_, mu_, sigma_, modamp_, modlambda_, modphase_] := (1
+ modamp*Sin[modphase + (2*Pi*x)/modlambda])/(E^((-mu +
x)^2/(2*sigma^2))*Sqrt[2*Pi]*sigma)

My original method is:

MonteCarloSample[distfn_, distmin_, distmax_] := Module[{a, b},
    While[(a = Random[Real, {distmin, distmax}]; b = Random[];
        b > distfn[a])];
    a]

MonteCarloEnsemble[numberofsamples_, 
    distfn_, distmin_, distmax_] := Table[MonteCarloSample[distfn,
distmin, 
    distmax], {numberofsamples}]

pts1 = MonteCarloEnsemble[10^4, ModulatedGaussian[#, 0.0, 1.0, 
      1.0, 0.6, 0] &, -5, 5];

which is inefficient but reasonably fast. Using the alternative method
of using an interpolation to generate the CDF, I perform the following:

NDistCDF[distfn_, x_, lowerbound_, upperbound_] := NIntegrate[distfn[
    t], {t, lowerbound, x}, Method -> QuasiMonteCarlo]

where the QuasiMonteCarlo method seems to be necessary to get NIntegrate
to converge, and I create a dummy function for each particular case,
e.g.

mydist[x_] = ModulatedGaussian[x, 0.0, 1.0, 1.0, 0.6, 0.0]

The interpolation is then done using:

InterpolatedInverseCDF = Interpolation[Table[{NDistCDF[mydist, x,
-Infinity, 
    Infinity], x}, {x, -5, 5, 0.01}], InterpolationOrder -> 1]

This takes a really long time! However, points are then generated
rapidly using:

pts2 = Table[InterpolatedInverseCDF[Random[]], {10000}]

and of course Histograms of pts1 and pts2 are the same. Whilst in the
interpolating method I have gained speed once I have the interpolation,
I have the same deficit of range limits that I had before, plus I have
to limit the resolution of the interpolation (in this case steps of
0.01) to have the interpolation be calculated in a reasonable time. This
doesn't seem to be better to me since I have to be careful not to miss
features in the original PDF by choosing a fine enough step size. I
don't have to worry about this in the original method. Have I missed
something? Is there a better algorithm than this?

I'd appreciate any comments.

Hywel

> -----Original Message-----
> From: dennis [mailto:dwangsness at earthlink.net] 
To: mathgroup at smc.vnet.net
> Sent: 21 August 2005 08:52
> Subject: [mg59856] [mg59792] Re: Random sampling of an arbitrary distribution
> 
> Hi,
> 
> I use the very simple method of evaluating the inverse CDF 
> with a uniform random value between 0 and 1 (Random[])
> 
> x:=inverseCDF[Random[]]
> 
> to generate random numbers.  If the inverse of the Cumulative 
> Distribution Function (CDF) is not available in closed form, 
> which it usually isn't, I use
> 
> inverseCDF = Interpolation[Table[{CDF[x],x},{x,xmin,xmax,dx}]]
> 
> As long as the range of x is finite you can generate random 
> values over the whole range.  If the range is infinite then 
> you need to be satisfied with a range of x  such as 0.000001 
> < CDF[x] < 0.999999 which has always been good enough for me. 
>  I usually set
> interpolationOrder->1 because I once had a case where the 
> default third
> order interpolation produced a CDF that was not monotonically 
> increasing
> 
> With this method nothing ever needs to be thrown away in the 
> process of generting random samples and it seems fast.
> 
> Dennis
> 
> 
>

Prev by Date: Re: Re: Simplifying Conjugate[] with 5.2 Mac

Next by Date: Re: Re: Simplifying Conjugate[] with 5.2 Mac

Previous by thread: Re: Random sampling of an arbitrary distribution

Next by thread: Re: Random sampling of an arbitrary distribution