calculating a distribution of separation of two means
- To: mathgroup at smc.vnet.net
- Subject: [mg29356] calculating a distribution of separation of two means
- From: "Philip M. Howe" <pmhowe at lanl.gov>
- Date: Thu, 14 Jun 2001 02:27:31 -0400 (EDT)
- Sender: owner-wri-mathgroup at wolfram.com
Dear Colleagues,
I have a small problem I have been working on, and would like your
insights and suggestions: I'd like to calculate two distributions,
and then calculate a measure of how the difference in the means vary
as a function of the nature of the two distributions.
This loads some packages I use.
Needs["Statistics`ContinuousDistributions`"];
Needs["Statistics`NormalDistribution`"];
<<NumericalMath`ListIntegrate`;
<< Graphics`Graphics`
<<Statistics`DescriptiveStatistics`;
Here are two normal distributions, with means 20 and 15, with some
arbitrarily chosen standard deviations.
\[Mu]1 = 20;
\[Sigma]1 = .1;
\[Mu]2 = 15;
\[Sigma]2 = 1;
d1 = NormalDistribution[\[Mu]1, \[Sigma]1];
d2 = NormalDistribution[\[Mu]2, \[Sigma]2];
Here is a distribution of the differences. A histogram can be made
of the distribution of the differences. The histogram would have to
be normalized in order to use it as a probability density function.
Clear[h1,d3,div];
n=1000;
d3=(RandomArray[d1,n]-RandomArray[d2,n]);
ListPlot[d3];
h1=Histogram[d3];
Below, I calculate the distribution of the means in the following
way: I sample the two distributions n times, and calculate n
differences (d3). Naturally, I want to use values of n in the
multiple thousands, but I used a small n and left off the semicolons
for you to see what the output looks like.
Clear[h1,j,d3,t1,t2,t3];
n=10;
d3=(RandomArray[d1,n]-RandomArray[d2,n])
I then place these differences into n equal sized bins (t1). This is
a very slow step when n is large. The inefficiency is at least
partially due to the generation of lots of empty lists. I don't know
how to get around it. Suggestions are welcome.
t1=Table[{Select[d3,(n-1<#<n)&],n},{n,1,Length[d3]}]
I now add up all the values within a bin, and calculate the mean
value. I do that for each bin (t2). There is a nice side effect, in
that this eliminates all those empty lists. They are still present
with zeros in them, so I haven't gained much except, perhaps,
esthetics.
t2=Table[{t1[[j]][[2]],Plus@@t1[[j]][[1]]},{j,1,Length[t1]}];
div=ListIntegrate[t2,1]
I now normalize the table, to be able to treat it as a probability
density function (t3).
t3 = t2 /. {x_, y_} -> {x, y/div};
pp4 = ListPlot[t3, PlotRange -> {{0, 10}, {0, .40}},
PlotStyle -> PointSize[0.02]];
h3 = Histogram[d3];
margin = \[Mu]1/\[Mu]2 - 1
safetyFactor = \[Mu]1/\[Mu]2
StandardDeviation[d3]
Mean[d3]
The output looks reasonable to me, and I think it is correct
(although, as is probably obvious to you, I am no statistician). My
coding makes the calculation quite slow. I'd appreciate suggestions
for speed-up. Also, this calculation must be fairly common, and I
bet there are much better ways of approaching it. Your input will
be appreciated. (I'd also appreciate a better way to send questions
like this to you. For example, I placed (* and *) around all my
comments, thinking that you could just copy and paste everything into
a fresh notebook and run it. When I did this (I'm running v4.1 on a
MacIntosh G4), Mathematica picked up some of the (* and *) and
ignored the others.)
Thanks,
Phil
--
Philip M. Howe
Program Manager, Stockpile Surety
Los Alamos National Laboratory
(505) 665-5332
(505) 667-9498
Fax: 505-665-5249
email pmhowe at lanl.gov
Mail Stop F606