Re: Re: Random sampling of an arbitrary distribution
- To: mathgroup at smc.vnet.net
- Subject: [mg59856] Re: [mg59792] Re: Random sampling of an arbitrary distribution
- From: "Owen, HL \(Hywel\)" <h.l.owen at dl.ac.uk>
- Date: Wed, 24 Aug 2005 06:30:24 -0400 (EDT)
- Sender: owner-wri-mathgroup at wolfram.com
Thanks to Bill, Dennis and ggroup who all pointed out that I should have
used the inverse CDF to do the sampling, e.g. the method quoted in the
message below. I should have remembered that from my statistics class!
;)
It is certainly true that for PDFs for which an analytic inverse
function can be found, the best method is to use e.g. for an exponential
PDF:
pdf[x_]=Exp[-x];
lb=0; ub=Infinity;
cdf[x_]=Integrate[pdf[t],{t,lb,x}, Assumptions->lb<x<ub];
inversecdf[u_]=x/.Solve[cdf[x]==u,x][[1]];
somepoints=Table[inversecdf[Random[]], {10000}];
This works fine and allows me to create samples from -Infinity to
Infinity.
For functions that don't have analytic inverse CDFs the choice of
methods is not so clear to me. Going back to my original test function,
a modulated Gaussian:
ModulatedGaussian[x_, mu_, sigma_, modamp_, modlambda_, modphase_] := (1
+ modamp*Sin[modphase + (2*Pi*x)/modlambda])/(E^((-mu +
x)^2/(2*sigma^2))*Sqrt[2*Pi]*sigma)
My original method is:
MonteCarloSample[distfn_, distmin_, distmax_] := Module[{a, b},
While[(a = Random[Real, {distmin, distmax}]; b = Random[];
b > distfn[a])];
a]
MonteCarloEnsemble[numberofsamples_,
distfn_, distmin_, distmax_] := Table[MonteCarloSample[distfn,
distmin,
distmax], {numberofsamples}]
pts1 = MonteCarloEnsemble[10^4, ModulatedGaussian[#, 0.0, 1.0,
1.0, 0.6, 0] &, -5, 5];
which is inefficient but reasonably fast. Using the alternative method
of using an interpolation to generate the CDF, I perform the following:
NDistCDF[distfn_, x_, lowerbound_, upperbound_] := NIntegrate[distfn[
t], {t, lowerbound, x}, Method -> QuasiMonteCarlo]
where the QuasiMonteCarlo method seems to be necessary to get NIntegrate
to converge, and I create a dummy function for each particular case,
e.g.
mydist[x_] = ModulatedGaussian[x, 0.0, 1.0, 1.0, 0.6, 0.0]
The interpolation is then done using:
InterpolatedInverseCDF = Interpolation[Table[{NDistCDF[mydist, x,
-Infinity,
Infinity], x}, {x, -5, 5, 0.01}], InterpolationOrder -> 1]
This takes a really long time! However, points are then generated
rapidly using:
pts2 = Table[InterpolatedInverseCDF[Random[]], {10000}]
and of course Histograms of pts1 and pts2 are the same. Whilst in the
interpolating method I have gained speed once I have the interpolation,
I have the same deficit of range limits that I had before, plus I have
to limit the resolution of the interpolation (in this case steps of
0.01) to have the interpolation be calculated in a reasonable time. This
doesn't seem to be better to me since I have to be careful not to miss
features in the original PDF by choosing a fine enough step size. I
don't have to worry about this in the original method. Have I missed
something? Is there a better algorithm than this?
I'd appreciate any comments.
Hywel
> -----Original Message-----
> From: dennis [mailto:dwangsness at earthlink.net]
To: mathgroup at smc.vnet.net
> Sent: 21 August 2005 08:52
> Subject: [mg59856] [mg59792] Re: Random sampling of an arbitrary distribution
>
> Hi,
>
> I use the very simple method of evaluating the inverse CDF
> with a uniform random value between 0 and 1 (Random[])
>
> x:=inverseCDF[Random[]]
>
> to generate random numbers. If the inverse of the Cumulative
> Distribution Function (CDF) is not available in closed form,
> which it usually isn't, I use
>
> inverseCDF = Interpolation[Table[{CDF[x],x},{x,xmin,xmax,dx}]]
>
> As long as the range of x is finite you can generate random
> values over the whole range. If the range is infinite then
> you need to be satisfied with a range of x such as 0.000001
> < CDF[x] < 0.999999 which has always been good enough for me.
> I usually set
> interpolationOrder->1 because I once had a case where the
> default third
> order interpolation produced a CDF that was not monotonically
> increasing
>
> With this method nothing ever needs to be thrown away in the
> process of generting random samples and it seems fast.
>
> Dennis
>
>
>