MathGroup Archive: August 2005 [00734]

[Date Index] [Thread Index] [Author Index]
Re: Re: Random sampling of an arbitrary distribution
To: mathgroup at smc.vnet.net
Subject: [mg59950] Re: [mg59925] Re: Random sampling of an arbitrary distribution
From: "Owen, HL \(Hywel\)" <h.l.owen at dl.ac.uk>
Date: Fri, 26 Aug 2005 04:53:41 -0400 (EDT)
Sender: owner-wri-mathgroup at wolfram.com
Hi Bill,

Thanks for the reply. First off, I should point out that I am a physicist (i.e. small maths brain!) rather than a mathematician (big maths brain). I guess it shows! ;)  As you can see, I'm a beginner in the areas of random sampling, and am trying to get to grips with pseudo and quasi-random sequences, and am reading up on low-discrepancy sequences at the moment. My particular interest is in looking at how the choice of a sequence type affects a simulation - I have a simulation which physically has a modulation as below, although I chose the particular function as being a plausible one rather it 'having' to be that one. But it appears to me at the moment that it integrates ok (see below):

> 
> >ModulatedGaussian[x_, mu_, sigma_, modamp_, modlambda_, 
> modphase_] := 
> >(1 + modamp*Sin[modphase + (2*Pi*x)/modlambda])/(E^((-mu +
> >x)^2/(2*sigma^2))*Sqrt[2*Pi]*sigma)
> 
> Are you aware the expression above cannot be a density 
> function? An expression for a denisty function must integrate 
> to 1 over its domain. To see this expression does not satisfy 
> this in general set modphase to 0 and modlambda to 1. Now 
> note, the expression can be written as the sum of the pdf for 
> a normal distribution and a Sin[2 Pi x] times the pdf for a 
> normal distribution. The latter term is the expected value of 
> a Sin[2 Pi x] when x has a normal distribution.
> 
> Using mathStatica,
> 
> f = (1/(\[Sigma]*Sqrt[2*Pi]))*Exp[-((x - \[Mu])^2/
>        (2*\[Sigma]^2))];
>   domain[f] = {x, -Infinity, Infinity} && 
>     {\[Mu] \[Element] Reals, \[Sigma] > 0}; 
> 
> Expect[a*Sin[2*Pi*x], f]
> 
> (a*Sin[2*Pi*\[Mu]])/ E^(2*Pi^2*\[Sigma]^2)
> 
> That is the integral over the domain does not evaluate to 1.

That's weird, because when I integrate it:
Assuming[x $B":(J Reals && mu > 0 && sigma > 0 && modamp > 0 &&
     modlambda > 0 && modphase $B":(J 
    Reals, Integrate[
      ModulatedGaussian[x, mu, sigma, modamp, modlambda, modphase], {x, -
      Infinity, Infinity}]]

I get 1, e.g.

1 + (modamp*Sin[modphase + (2*mu*Pi)/modlambda])/E^((
              2*Pi^2*sigma^2)/modlambda^2) /. {modlambda ->
       1, modphase -> 0, mu -> 1, sigma -> 1}

I accept that you're right, but I don't see why!
 
> Hence the usual methods for generating pseudo random numbers 
> cannot be relied on in this instance.
> 
> >My original method is:
> 
> >MonteCarloSample[distfn_, distmin_, distmax_] := Module[{a, b}, 
> >While[(a = Random[Real, {distmin, distmax}]; b = Random[]; b > 
> >distfn[a])]; a]
> 
> >MonteCarloEnsemble[numberofsamples_,
> >distfn_, distmin_, distmax_] := 
> Table[MonteCarloSample[distfn, distmin, 
> >distmax], {numberofsamples}]
> 
> >pts1 = MonteCarloEnsemble[10^4, ModulatedGaussian[#, 0.0, 1.0, 1.0, 
> >0.6, 0] &, -5, 5];
> 
> Normally, when someone speaks of getting random numbers from 
> a specified density function they mean the the probability of 
> getting a number less than a specified value is equal to the 
> cumulative distribution function evaluated at that specified 
> value. What you are doing above is distinctly different. You 
> are selecting random points conditioned on having the value 
> of a expression (intended to be the density function?) at 
> that point less than the random another random point. This is 
> not equivalent to generating random numbers with a specified 
> density function.

Yes, but I thought that since the sampled points match the density function that it was a valid method. Is there a bias or something that isn't visible?

> 
> If all you need is to sample an expression at random points 
> over its domain then simply doing
> 
> f[Random[Real,{a,b}] will do when a and b are finite.

No, that's not what I'm trying to do. I'm trying to create a set of sample points whose density varies according to the density function I'm using. If f is the inverse CDF then your method works - whether f is analytic or an interpolation. I've tried it for the modulated Gaussian case and it appears to work for me, at least when using an Interpolation for the inverse CDF.

> OTOH, if you do want random numbers from a specified density 
> function it is possible to achieve this when the cumulative 
> distribution function cannot be easily evaluated. But rather 
> than attempt to explain the possibilities here I stongly 
> suggest you consult established texts. An excellent starting 
> point is Knuth's Seminumerical Algorithms Vol 2. 

Yes, I have it front of me now!

My sampling method above appears to me to be a form of von Neumann's rejection method, so I can't see why it isn't valid statistically. To look at the difference between psuedo- and quasi- sequences, I am also trying:

QuasiRandomMonteCarloEnsemble[numberofsamples_,distfn_,distmin_,distmax_]:=
  Module[{a,b,samples={},sequencenumber},
    For[sequencenumber=1,Length[samples]<numberofsamples,sequencenumber++,
      {a,b}=Halton[sequencenumber,2];
      If[b>distfn[distmin+(distmax-distmin)a],samples,
        AppendTo[samples,distmin+(distmax-distmin)a]];
      ];
    samples]

where

Halton[n_, s_] := vanderCorput[n, Prime[Range[s]]];

and

vanderCorput[n_, b_] := IntegerDigits[n,b].(b^Range[-Floor[Log[b,n]+1],-1]);
Attributes[vanderCorput] = Listable;

I think this produces values which are distributed in x with more uniformity (lower discrepancy) than the Psuedo case, but I'm happy to be shown that it's wrong by a better example! ;) Of course, I can then replace the sequence with another one, e.g. Faure etc.

> 
> A couple of final thoughts. Creating a psuedo random number 
> generator with the desired properties is anything but a 
> trivial task. It is extremely easy to create a very bad 
> generator. Bad meaning the numbers do generated have few if 
> any of the desired properties. Knuth discusses several 
> examples of bad generators.

Absolutely! It's a good book.

Hywel
Prev by Date: Re: my wish list for Mathematica next major version
Next by Date: Re: Mathematica & Excel
Previous by thread: Re: Random sampling of an arbitrary distribution
Next by thread: Re: Random sampling of an arbitrary distribution