MathGroup Archive: June 2001 [00207]

[Date Index] [Thread Index] [Author Index]

calculating a distribution of separation of two means

To: mathgroup at smc.vnet.net
Subject: [mg29356] calculating a distribution of separation of two means
From: "Philip M. Howe" <pmhowe at lanl.gov>
Date: Thu, 14 Jun 2001 02:27:31 -0400 (EDT)
Sender: owner-wri-mathgroup at wolfram.com

Dear Colleagues,

I have a small problem I have been working on, and would like your 
insights and suggestions:   I'd like to calculate two distributions, 
and then calculate a measure of how the difference in the means vary 
as a function of the nature of the two distributions.

This loads some packages I use.

Needs["Statistics`ContinuousDistributions`"];
Needs["Statistics`NormalDistribution`"];
<<NumericalMath`ListIntegrate`;
<< Graphics`Graphics`
<<Statistics`DescriptiveStatistics`;


Here are two normal distributions, with means 20 and 15, with some 
arbitrarily chosen standard deviations.

\[Mu]1 = 20;
\[Sigma]1 = .1;
\[Mu]2 = 15;
\[Sigma]2 = 1;

d1 = NormalDistribution[\[Mu]1, \[Sigma]1];
d2 = NormalDistribution[\[Mu]2, \[Sigma]2];



Here is a distribution of the differences.  A histogram can be made 
of the distribution of the differences. The histogram would have to 
be normalized in order to use it as a probability density function.

Clear[h1,d3,div];
n=1000;
d3=(RandomArray[d1,n]-RandomArray[d2,n]);

ListPlot[d3];
h1=Histogram[d3];


Below, I calculate the distribution of the means in the following 
way:  I sample the two distributions n times, and calculate n 
differences (d3). Naturally, I want to use values of n  in the 
multiple thousands, but I used a small n and left off the semicolons 
for you to see what the output looks like.

Clear[h1,j,d3,t1,t2,t3];
n=10;
d3=(RandomArray[d1,n]-RandomArray[d2,n])


  I then place these differences into n equal sized bins (t1). This is 
a very slow step when n is large. The inefficiency is at least 
partially due to the generation of lots of empty lists.  I don't know 
how to get around it. Suggestions are welcome.

t1=Table[{Select[d3,(n-1<#<n)&],n},{n,1,Length[d3]}]

I now add up all the values within a bin, and calculate the mean 
value.  I do that for each bin (t2). There is a nice side effect, in 
that  this eliminates all those empty lists.  They are still present 
with zeros in them, so I haven't gained much except, perhaps, 
esthetics.

t2=Table[{t1[[j]][[2]],Plus@@t1[[j]][[1]]},{j,1,Length[t1]}];
div=ListIntegrate[t2,1]

  I now normalize the table, to be able to treat it as a probability 
density function (t3).

t3 = t2 /. {x_, y_} -> {x, y/div};

pp4 = ListPlot[t3, PlotRange -> {{0, 10}, {0, .40}},
       PlotStyle -> PointSize[0.02]];
h3 = Histogram[d3];

margin = \[Mu]1/\[Mu]2 - 1
safetyFactor = \[Mu]1/\[Mu]2
StandardDeviation[d3]
Mean[d3]


The output looks reasonable to me, and I think it is correct 
(although, as is probably obvious to you, I am no statistician). My 
coding makes the calculation quite slow.  I'd appreciate suggestions 
for speed-up.  Also, this calculation must be fairly common, and I 
bet there are much better ways of approaching it.  Your input will 
be appreciated.  (I'd also appreciate a better way to send questions 
like this to you.  For example, I placed (* and *) around all my 
comments, thinking that you could just copy and paste everything into 
a fresh notebook and run it.  When I did this (I'm running v4.1 on a 
MacIntosh G4),  Mathematica picked up some of the (* and *) and 
ignored the others.)

Thanks,

Phil
-- 
Philip M. Howe
Program Manager, Stockpile Surety
Los Alamos National Laboratory

(505) 665-5332
(505) 667-9498
Fax: 505-665-5249
email pmhowe at lanl.gov
Mail Stop F606

Prev by Date: Re: display factored form

Next by Date: AMD vs. Intel Floating Point

Previous by thread: Re: pattern matching quirks

Next by thread: Re: calculating a distribution of separation of two means