       Re: calculating a distribution of separation of two means

• To: mathgroup at smc.vnet.net
• Subject: [mg29376] Re: [mg29356] calculating a distribution of separation of two means
• From: BobHanlon at aol.com
• Date: Fri, 15 Jun 2001 02:23:45 -0400 (EDT)
• Sender: owner-wri-mathgroup at wolfram.com

```In a message dated 2001/6/14 3:42:47 AM, pmhowe at lanl.gov writes:

>I have a small problem I have been working on, and would like your
>insights and suggestions:   I'd like to calculate two distributions,
>and then calculate a measure of how the difference in the means vary
>as a function of the nature of the two distributions.
>
>This loads some packages I use.
>
>Needs["Statistics`ContinuousDistributions`"];
>Needs["Statistics`NormalDistribution`"];
><<NumericalMath`ListIntegrate`;
><< Graphics`Graphics`
><<Statistics`DescriptiveStatistics`;
>
>
>Here are two normal distributions, with means 20 and 15, with some
>arbitrarily chosen standard deviations.
>
>\[Mu]1 = 20;
>\[Sigma]1 = .1;
>\[Mu]2 = 15;
>\[Sigma]2 = 1;
>
>d1 = NormalDistribution[\[Mu]1, \[Sigma]1];
>d2 = NormalDistribution[\[Mu]2, \[Sigma]2];
>
>
>
>Here is a distribution of the differences.  A histogram can be made
>of the distribution of the differences. The histogram would have to
>be normalized in order to use it as a probability density function.
>
>Clear[h1,d3,div];
>n=1000;
>d3=(RandomArray[d1,n]-RandomArray[d2,n]);
>
>ListPlot[d3];
>h1=Histogram[d3];
>
>
>Below, I calculate the distribution of the means in the following
>way:  I sample the two distributions n times, and calculate n
>differences (d3). Naturally, I want to use values of n  in the
>multiple thousands, but I used a small n and left off the semicolons
>for you to see what the output looks like.
>
>Clear[h1,j,d3,t1,t2,t3];
>n=10;
>d3=(RandomArray[d1,n]-RandomArray[d2,n])
>
>
>  I then place these differences into n equal sized bins (t1). This is
>
>a very slow step when n is large. The inefficiency is at least
>partially due to the generation of lots of empty lists.  I don't know
>how to get around it. Suggestions are welcome.
>
>t1=Table[{Select[d3,(n-1<#<n)&],n},{n,1,Length[d3]}]
>
>I now add up all the values within a bin, and calculate the mean
>value.  I do that for each bin (t2). There is a nice side effect, in
>that  this eliminates all those empty lists.  They are still present
>with zeros in them, so I haven't gained much except, perhaps,
>esthetics.
>
>t2=Table[{t1[[j]][],Plus@@t1[[j]][]},{j,1,Length[t1]}];
>div=ListIntegrate[t2,1]
>
>  I now normalize the table, to be able to treat it as a probability
>density function (t3).
>
>t3 = t2 /. {x_, y_} -> {x, y/div};
>
>pp4 = ListPlot[t3, PlotRange -> {{0, 10}, {0, .40}},
>       PlotStyle -> PointSize[0.02]];
>h3 = Histogram[d3];
>
>margin = \[Mu]1/\[Mu]2 - 1
>safetyFactor = \[Mu]1/\[Mu]2
>StandardDeviation[d3]
>Mean[d3]
>
>
>The output looks reasonable to me, and I think it is correct
>(although, as is probably obvious to you, I am no statistician). My
>coding makes the calculation quite slow.  I'd appreciate suggestions
>for speed-up.  Also, this calculation must be fairly common, and I
>bet there are much better ways of approaching it.  Your input will
>be appreciated.  (I'd also appreciate a better way to send questions
>like this to you.  For example, I placed (* and *) around all my
>comments, thinking that you could just copy and paste everything into
>a fresh notebook and run it.  When I did this (I'm running v4.1 on a
>MacIntosh G4),  Mathematica picked up some of the (* and *) and
>ignored the others.)
>

Needs["Statistics`NormalDistribution`"];

d1 = NormalDistribution[m1, s1];
d2 = NormalDistribution[m2, s2];

Let W = X - Y where X and Y are independent normal random variables.
Then the PDF for W is given by

Integrate[PDF[d1, w+y]*PDF[d2, y], {y, -Infinity, Infinity},
GenerateConditions -> False]

1/(E^((-m1 + m2 + w)^2/
(2*(s1^2 + s2^2)))*(Sqrt[2*Pi]*s1*
Sqrt[1/s2^2 + 1/s1^2]*s2))

This is just NormalDistribution[m1-m2, Sqrt[s1^2+s2^2]]

FullSimplify[% == PDF[NormalDistribution[m1-m2, Sqrt[s1^2+s2^2]], w],
Element[{s1, s2}, Reals] && s1 > 0 && s2 > 0]

True

n = 1000;
data = RandomArray[d1 /. {m1 -> 20, s1 -> 0.1}, n] -
RandomArray[d2 /. {m2 -> 15, s2 -> 1.0}, n];

The maximum likelihood estimates for the mean and standard deviation
from the data are

{Mean[data], StandardDeviationMLE[data]}

{5.002175897096748, 0.9613025050188039}

This is equivalent (but much faster and uses much less memory) to solving
the log likelihood equations

eqns = Thread[(D[Plus @@ (Log[PDF[NormalDistribution[m, s], #]]& /@
data), #]& /@ {m, s}) == 0];

FindRoot[eqns, {m, Mean[data]}, {s, 1.}]

{m -> 5.002175897096749,
s -> 0.9613025050188035}

This compares with

{m1-m2, Sqrt[s1^2+s2^2]} /.
{m1 -> 20, s1 -> 0.1, m2 -> 15, s2 -> 1.0}

{5, 1.004987562112089}

Bob Hanlon
Chantilly, VA  USA

```

• Prev by Date: Re: pattern matching quirks
• Next by Date: Converting Integer to Binary and using bits
• Previous by thread: calculating a distribution of separation of two means
• Next by thread: AMD vs. Intel Floating Point