Re: Re: Random sampling of an arbitrary distribution
- To: mathgroup at smc.vnet.net
- Subject: [mg59856] Re: [mg59792] Re: Random sampling of an arbitrary distribution
- From: "Owen, HL \(Hywel\)" <h.l.owen at dl.ac.uk>
- Date: Wed, 24 Aug 2005 06:30:24 -0400 (EDT)
- Sender: owner-wri-mathgroup at wolfram.com
Thanks to Bill, Dennis and ggroup who all pointed out that I should have used the inverse CDF to do the sampling, e.g. the method quoted in the message below. I should have remembered that from my statistics class! ;) It is certainly true that for PDFs for which an analytic inverse function can be found, the best method is to use e.g. for an exponential PDF: pdf[x_]=Exp[-x]; lb=0; ub=Infinity; cdf[x_]=Integrate[pdf[t],{t,lb,x}, Assumptions->lb<x<ub]; inversecdf[u_]=x/.Solve[cdf[x]==u,x][[1]]; somepoints=Table[inversecdf[Random[]], {10000}]; This works fine and allows me to create samples from -Infinity to Infinity. For functions that don't have analytic inverse CDFs the choice of methods is not so clear to me. Going back to my original test function, a modulated Gaussian: ModulatedGaussian[x_, mu_, sigma_, modamp_, modlambda_, modphase_] := (1 + modamp*Sin[modphase + (2*Pi*x)/modlambda])/(E^((-mu + x)^2/(2*sigma^2))*Sqrt[2*Pi]*sigma) My original method is: MonteCarloSample[distfn_, distmin_, distmax_] := Module[{a, b}, While[(a = Random[Real, {distmin, distmax}]; b = Random[]; b > distfn[a])]; a] MonteCarloEnsemble[numberofsamples_, distfn_, distmin_, distmax_] := Table[MonteCarloSample[distfn, distmin, distmax], {numberofsamples}] pts1 = MonteCarloEnsemble[10^4, ModulatedGaussian[#, 0.0, 1.0, 1.0, 0.6, 0] &, -5, 5]; which is inefficient but reasonably fast. Using the alternative method of using an interpolation to generate the CDF, I perform the following: NDistCDF[distfn_, x_, lowerbound_, upperbound_] := NIntegrate[distfn[ t], {t, lowerbound, x}, Method -> QuasiMonteCarlo] where the QuasiMonteCarlo method seems to be necessary to get NIntegrate to converge, and I create a dummy function for each particular case, e.g. mydist[x_] = ModulatedGaussian[x, 0.0, 1.0, 1.0, 0.6, 0.0] The interpolation is then done using: InterpolatedInverseCDF = Interpolation[Table[{NDistCDF[mydist, x, -Infinity, Infinity], x}, {x, -5, 5, 0.01}], InterpolationOrder -> 1] This takes a really long time! However, points are then generated rapidly using: pts2 = Table[InterpolatedInverseCDF[Random[]], {10000}] and of course Histograms of pts1 and pts2 are the same. Whilst in the interpolating method I have gained speed once I have the interpolation, I have the same deficit of range limits that I had before, plus I have to limit the resolution of the interpolation (in this case steps of 0.01) to have the interpolation be calculated in a reasonable time. This doesn't seem to be better to me since I have to be careful not to miss features in the original PDF by choosing a fine enough step size. I don't have to worry about this in the original method. Have I missed something? Is there a better algorithm than this? I'd appreciate any comments. Hywel > -----Original Message----- > From: dennis [mailto:dwangsness at earthlink.net] To: mathgroup at smc.vnet.net > Sent: 21 August 2005 08:52 > Subject: [mg59856] [mg59792] Re: Random sampling of an arbitrary distribution > > Hi, > > I use the very simple method of evaluating the inverse CDF > with a uniform random value between 0 and 1 (Random[]) > > x:=inverseCDF[Random[]] > > to generate random numbers. If the inverse of the Cumulative > Distribution Function (CDF) is not available in closed form, > which it usually isn't, I use > > inverseCDF = Interpolation[Table[{CDF[x],x},{x,xmin,xmax,dx}]] > > As long as the range of x is finite you can generate random > values over the whole range. If the range is infinite then > you need to be satisfied with a range of x such as 0.000001 > < CDF[x] < 0.999999 which has always been good enough for me. > I usually set > interpolationOrder->1 because I once had a case where the > default third > order interpolation produced a CDF that was not monotonically > increasing > > With this method nothing ever needs to be thrown away in the > process of generting random samples and it seems fast. > > Dennis > > >