MathGroup Archive: June 2001 [00223]

[Date Index] [Thread Index] [Author Index]

Re: calculating a distribution of separation of two means

To: mathgroup at smc.vnet.net
Subject: [mg29376] Re: [mg29356] calculating a distribution of separation of two means
From: BobHanlon at aol.com
Date: Fri, 15 Jun 2001 02:23:45 -0400 (EDT)
Sender: owner-wri-mathgroup at wolfram.com

In a message dated 2001/6/14 3:42:47 AM, pmhowe at lanl.gov writes:

>I have a small problem I have been working on, and would like your 
>insights and suggestions:   I'd like to calculate two distributions, 
>and then calculate a measure of how the difference in the means vary 
>as a function of the nature of the two distributions.
>
>This loads some packages I use.
>
>Needs["Statistics`ContinuousDistributions`"];
>Needs["Statistics`NormalDistribution`"];
><<NumericalMath`ListIntegrate`;
><< Graphics`Graphics`
><<Statistics`DescriptiveStatistics`;
>
>
>Here are two normal distributions, with means 20 and 15, with some 
>arbitrarily chosen standard deviations.
>
>\[Mu]1 = 20;
>\[Sigma]1 = .1;
>\[Mu]2 = 15;
>\[Sigma]2 = 1;
>
>d1 = NormalDistribution[\[Mu]1, \[Sigma]1];
>d2 = NormalDistribution[\[Mu]2, \[Sigma]2];
>
>
>
>Here is a distribution of the differences.  A histogram can be made 
>of the distribution of the differences. The histogram would have to 
>be normalized in order to use it as a probability density function.
>
>Clear[h1,d3,div];
>n=1000;
>d3=(RandomArray[d1,n]-RandomArray[d2,n]);
>
>ListPlot[d3];
>h1=Histogram[d3];
>
>
>Below, I calculate the distribution of the means in the following 
>way:  I sample the two distributions n times, and calculate n 
>differences (d3). Naturally, I want to use values of n  in the 
>multiple thousands, but I used a small n and left off the semicolons 
>for you to see what the output looks like.
>
>Clear[h1,j,d3,t1,t2,t3];
>n=10;
>d3=(RandomArray[d1,n]-RandomArray[d2,n])
>
>
>  I then place these differences into n equal sized bins (t1). This is
>
>a very slow step when n is large. The inefficiency is at least 
>partially due to the generation of lots of empty lists.  I don't know 
>how to get around it. Suggestions are welcome.
>
>t1=Table[{Select[d3,(n-1<#<n)&],n},{n,1,Length[d3]}]
>
>I now add up all the values within a bin, and calculate the mean 
>value.  I do that for each bin (t2). There is a nice side effect, in 
>that  this eliminates all those empty lists.  They are still present 
>with zeros in them, so I haven't gained much except, perhaps, 
>esthetics.
>
>t2=Table[{t1[[j]][[2]],Plus@@t1[[j]][[1]]},{j,1,Length[t1]}];
>div=ListIntegrate[t2,1]
>
>  I now normalize the table, to be able to treat it as a probability 
>density function (t3).
>
>t3 = t2 /. {x_, y_} -> {x, y/div};
>
>pp4 = ListPlot[t3, PlotRange -> {{0, 10}, {0, .40}},
>       PlotStyle -> PointSize[0.02]];
>h3 = Histogram[d3];
>
>margin = \[Mu]1/\[Mu]2 - 1
>safetyFactor = \[Mu]1/\[Mu]2
>StandardDeviation[d3]
>Mean[d3]
>
>
>The output looks reasonable to me, and I think it is correct 
>(although, as is probably obvious to you, I am no statistician). My 
>coding makes the calculation quite slow.  I'd appreciate suggestions 
>for speed-up.  Also, this calculation must be fairly common, and I 
>bet there are much better ways of approaching it.  Your input will 
>be appreciated.  (I'd also appreciate a better way to send questions 
>like this to you.  For example, I placed (* and *) around all my 
>comments, thinking that you could just copy and paste everything into 
>a fresh notebook and run it.  When I did this (I'm running v4.1 on a 
>MacIntosh G4),  Mathematica picked up some of the (* and *) and 
>ignored the others.)
>

Needs["Statistics`NormalDistribution`"];

d1 = NormalDistribution[m1, s1];
d2 = NormalDistribution[m2, s2];

Let W = X - Y where X and Y are independent normal random variables. 
Then the PDF for W is given by

Integrate[PDF[d1, w+y]*PDF[d2, y], {y, -Infinity, Infinity}, 
    GenerateConditions -> False]

1/(E^((-m1 + m2 + w)^2/
     (2*(s1^2 + s2^2)))*(Sqrt[2*Pi]*s1*
    Sqrt[1/s2^2 + 1/s1^2]*s2))

This is just NormalDistribution[m1-m2, Sqrt[s1^2+s2^2]]

FullSimplify[% == PDF[NormalDistribution[m1-m2, Sqrt[s1^2+s2^2]], w], 
  Element[{s1, s2}, Reals] && s1 > 0 && s2 > 0]

True

n = 1000; 
data = RandomArray[d1 /. {m1 -> 20, s1 -> 0.1}, n] - 
      RandomArray[d2 /. {m2 -> 15, s2 -> 1.0}, n];

The maximum likelihood estimates for the mean and standard deviation 
from the data are 

{Mean[data], StandardDeviationMLE[data]}

{5.002175897096748, 0.9613025050188039}

This is equivalent (but much faster and uses much less memory) to solving 
the log likelihood equations

eqns = Thread[(D[Plus @@ (Log[PDF[NormalDistribution[m, s], #]]& /@ 
                      data), #]& /@ {m, s}) == 0];

FindRoot[eqns, {m, Mean[data]}, {s, 1.}]

{m -> 5.002175897096749, 
  s -> 0.9613025050188035}

This compares with

{m1-m2, Sqrt[s1^2+s2^2]} /. 
  {m1 -> 20, s1 -> 0.1, m2 -> 15, s2 -> 1.0}

{5, 1.004987562112089}


Bob Hanlon
Chantilly, VA  USA

Prev by Date: Re: pattern matching quirks

Next by Date: Converting Integer to Binary and using bits

Previous by thread: calculating a distribution of separation of two means

Next by thread: AMD vs. Intel Floating Point