Re: Re: Random sampling of an arbitrary distribution
- To: mathgroup at smc.vnet.net
- Subject: [mg59950] Re: [mg59925] Re: Random sampling of an arbitrary distribution
- From: "Owen, HL \(Hywel\)" <h.l.owen at dl.ac.uk>
- Date: Fri, 26 Aug 2005 04:53:41 -0400 (EDT)
- Sender: owner-wri-mathgroup at wolfram.com
Hi Bill,
Thanks for the reply. First off, I should point out that I am a physicist (i.e. small maths brain!) rather than a mathematician (big maths brain). I guess it shows! ;) As you can see, I'm a beginner in the areas of random sampling, and am trying to get to grips with pseudo and quasi-random sequences, and am reading up on low-discrepancy sequences at the moment. My particular interest is in looking at how the choice of a sequence type affects a simulation - I have a simulation which physically has a modulation as below, although I chose the particular function as being a plausible one rather it 'having' to be that one. But it appears to me at the moment that it integrates ok (see below):
>
> >ModulatedGaussian[x_, mu_, sigma_, modamp_, modlambda_,
> modphase_] :=
> >(1 + modamp*Sin[modphase + (2*Pi*x)/modlambda])/(E^((-mu +
> >x)^2/(2*sigma^2))*Sqrt[2*Pi]*sigma)
>
> Are you aware the expression above cannot be a density
> function? An expression for a denisty function must integrate
> to 1 over its domain. To see this expression does not satisfy
> this in general set modphase to 0 and modlambda to 1. Now
> note, the expression can be written as the sum of the pdf for
> a normal distribution and a Sin[2 Pi x] times the pdf for a
> normal distribution. The latter term is the expected value of
> a Sin[2 Pi x] when x has a normal distribution.
>
> Using mathStatica,
>
> f = (1/(\[Sigma]*Sqrt[2*Pi]))*Exp[-((x - \[Mu])^2/
> (2*\[Sigma]^2))];
> domain[f] = {x, -Infinity, Infinity} &&
> {\[Mu] \[Element] Reals, \[Sigma] > 0};
>
> Expect[a*Sin[2*Pi*x], f]
>
> (a*Sin[2*Pi*\[Mu]])/ E^(2*Pi^2*\[Sigma]^2)
>
> That is the integral over the domain does not evaluate to 1.
That's weird, because when I integrate it:
Assuming[x $B":(J Reals && mu > 0 && sigma > 0 && modamp > 0 &&
modlambda > 0 && modphase $B":(J
Reals, Integrate[
ModulatedGaussian[x, mu, sigma, modamp, modlambda, modphase], {x, -
Infinity, Infinity}]]
I get 1, e.g.
1 + (modamp*Sin[modphase + (2*mu*Pi)/modlambda])/E^((
2*Pi^2*sigma^2)/modlambda^2) /. {modlambda ->
1, modphase -> 0, mu -> 1, sigma -> 1}
I accept that you're right, but I don't see why!
> Hence the usual methods for generating pseudo random numbers
> cannot be relied on in this instance.
>
> >My original method is:
>
> >MonteCarloSample[distfn_, distmin_, distmax_] := Module[{a, b},
> >While[(a = Random[Real, {distmin, distmax}]; b = Random[]; b >
> >distfn[a])]; a]
>
> >MonteCarloEnsemble[numberofsamples_,
> >distfn_, distmin_, distmax_] :=
> Table[MonteCarloSample[distfn, distmin,
> >distmax], {numberofsamples}]
>
> >pts1 = MonteCarloEnsemble[10^4, ModulatedGaussian[#, 0.0, 1.0, 1.0,
> >0.6, 0] &, -5, 5];
>
> Normally, when someone speaks of getting random numbers from
> a specified density function they mean the the probability of
> getting a number less than a specified value is equal to the
> cumulative distribution function evaluated at that specified
> value. What you are doing above is distinctly different. You
> are selecting random points conditioned on having the value
> of a expression (intended to be the density function?) at
> that point less than the random another random point. This is
> not equivalent to generating random numbers with a specified
> density function.
Yes, but I thought that since the sampled points match the density function that it was a valid method. Is there a bias or something that isn't visible?
>
> If all you need is to sample an expression at random points
> over its domain then simply doing
>
> f[Random[Real,{a,b}] will do when a and b are finite.
No, that's not what I'm trying to do. I'm trying to create a set of sample points whose density varies according to the density function I'm using. If f is the inverse CDF then your method works - whether f is analytic or an interpolation. I've tried it for the modulated Gaussian case and it appears to work for me, at least when using an Interpolation for the inverse CDF.
> OTOH, if you do want random numbers from a specified density
> function it is possible to achieve this when the cumulative
> distribution function cannot be easily evaluated. But rather
> than attempt to explain the possibilities here I stongly
> suggest you consult established texts. An excellent starting
> point is Knuth's Seminumerical Algorithms Vol 2.
Yes, I have it front of me now!
My sampling method above appears to me to be a form of von Neumann's rejection method, so I can't see why it isn't valid statistically. To look at the difference between psuedo- and quasi- sequences, I am also trying:
QuasiRandomMonteCarloEnsemble[numberofsamples_,distfn_,distmin_,distmax_]:=
Module[{a,b,samples={},sequencenumber},
For[sequencenumber=1,Length[samples]<numberofsamples,sequencenumber++,
{a,b}=Halton[sequencenumber,2];
If[b>distfn[distmin+(distmax-distmin)a],samples,
AppendTo[samples,distmin+(distmax-distmin)a]];
];
samples]
where
Halton[n_, s_] := vanderCorput[n, Prime[Range[s]]];
and
vanderCorput[n_, b_] := IntegerDigits[n,b].(b^Range[-Floor[Log[b,n]+1],-1]);
Attributes[vanderCorput] = Listable;
I think this produces values which are distributed in x with more uniformity (lower discrepancy) than the Psuedo case, but I'm happy to be shown that it's wrong by a better example! ;) Of course, I can then replace the sequence with another one, e.g. Faure etc.
>
> A couple of final thoughts. Creating a psuedo random number
> generator with the desired properties is anything but a
> trivial task. It is extremely easy to create a very bad
> generator. Bad meaning the numbers do generated have few if
> any of the desired properties. Knuth discusses several
> examples of bad generators.
Absolutely! It's a good book.
Hywel