Re: calculating a distribution of separation of two means
- To: mathgroup at smc.vnet.net
- Subject: [mg29376] Re: [mg29356] calculating a distribution of separation of two means
- From: BobHanlon at aol.com
- Date: Fri, 15 Jun 2001 02:23:45 -0400 (EDT)
- Sender: owner-wri-mathgroup at wolfram.com
In a message dated 2001/6/14 3:42:47 AM, pmhowe at lanl.gov writes: >I have a small problem I have been working on, and would like your >insights and suggestions: I'd like to calculate two distributions, >and then calculate a measure of how the difference in the means vary >as a function of the nature of the two distributions. > >This loads some packages I use. > >Needs["Statistics`ContinuousDistributions`"]; >Needs["Statistics`NormalDistribution`"]; ><<NumericalMath`ListIntegrate`; ><< Graphics`Graphics` ><<Statistics`DescriptiveStatistics`; > > >Here are two normal distributions, with means 20 and 15, with some >arbitrarily chosen standard deviations. > >\[Mu]1 = 20; >\[Sigma]1 = .1; >\[Mu]2 = 15; >\[Sigma]2 = 1; > >d1 = NormalDistribution[\[Mu]1, \[Sigma]1]; >d2 = NormalDistribution[\[Mu]2, \[Sigma]2]; > > > >Here is a distribution of the differences. A histogram can be made >of the distribution of the differences. The histogram would have to >be normalized in order to use it as a probability density function. > >Clear[h1,d3,div]; >n=1000; >d3=(RandomArray[d1,n]-RandomArray[d2,n]); > >ListPlot[d3]; >h1=Histogram[d3]; > > >Below, I calculate the distribution of the means in the following >way: I sample the two distributions n times, and calculate n >differences (d3). Naturally, I want to use values of n in the >multiple thousands, but I used a small n and left off the semicolons >for you to see what the output looks like. > >Clear[h1,j,d3,t1,t2,t3]; >n=10; >d3=(RandomArray[d1,n]-RandomArray[d2,n]) > > > I then place these differences into n equal sized bins (t1). This is > >a very slow step when n is large. The inefficiency is at least >partially due to the generation of lots of empty lists. I don't know >how to get around it. Suggestions are welcome. > >t1=Table[{Select[d3,(n-1<#<n)&],n},{n,1,Length[d3]}] > >I now add up all the values within a bin, and calculate the mean >value. I do that for each bin (t2). There is a nice side effect, in >that this eliminates all those empty lists. They are still present >with zeros in them, so I haven't gained much except, perhaps, >esthetics. > >t2=Table[{t1[[j]][[2]],Plus@@t1[[j]][[1]]},{j,1,Length[t1]}]; >div=ListIntegrate[t2,1] > > I now normalize the table, to be able to treat it as a probability >density function (t3). > >t3 = t2 /. {x_, y_} -> {x, y/div}; > >pp4 = ListPlot[t3, PlotRange -> {{0, 10}, {0, .40}}, > PlotStyle -> PointSize[0.02]]; >h3 = Histogram[d3]; > >margin = \[Mu]1/\[Mu]2 - 1 >safetyFactor = \[Mu]1/\[Mu]2 >StandardDeviation[d3] >Mean[d3] > > >The output looks reasonable to me, and I think it is correct >(although, as is probably obvious to you, I am no statistician). My >coding makes the calculation quite slow. I'd appreciate suggestions >for speed-up. Also, this calculation must be fairly common, and I >bet there are much better ways of approaching it. Your input will >be appreciated. (I'd also appreciate a better way to send questions >like this to you. For example, I placed (* and *) around all my >comments, thinking that you could just copy and paste everything into >a fresh notebook and run it. When I did this (I'm running v4.1 on a >MacIntosh G4), Mathematica picked up some of the (* and *) and >ignored the others.) > Needs["Statistics`NormalDistribution`"]; d1 = NormalDistribution[m1, s1]; d2 = NormalDistribution[m2, s2]; Let W = X - Y where X and Y are independent normal random variables. Then the PDF for W is given by Integrate[PDF[d1, w+y]*PDF[d2, y], {y, -Infinity, Infinity}, GenerateConditions -> False] 1/(E^((-m1 + m2 + w)^2/ (2*(s1^2 + s2^2)))*(Sqrt[2*Pi]*s1* Sqrt[1/s2^2 + 1/s1^2]*s2)) This is just NormalDistribution[m1-m2, Sqrt[s1^2+s2^2]] FullSimplify[% == PDF[NormalDistribution[m1-m2, Sqrt[s1^2+s2^2]], w], Element[{s1, s2}, Reals] && s1 > 0 && s2 > 0] True n = 1000; data = RandomArray[d1 /. {m1 -> 20, s1 -> 0.1}, n] - RandomArray[d2 /. {m2 -> 15, s2 -> 1.0}, n]; The maximum likelihood estimates for the mean and standard deviation from the data are {Mean[data], StandardDeviationMLE[data]} {5.002175897096748, 0.9613025050188039} This is equivalent (but much faster and uses much less memory) to solving the log likelihood equations eqns = Thread[(D[Plus @@ (Log[PDF[NormalDistribution[m, s], #]]& /@ data), #]& /@ {m, s}) == 0]; FindRoot[eqns, {m, Mean[data]}, {s, 1.}] {m -> 5.002175897096749, s -> 0.9613025050188035} This compares with {m1-m2, Sqrt[s1^2+s2^2]} /. {m1 -> 20, s1 -> 0.1, m2 -> 15, s2 -> 1.0} {5, 1.004987562112089} Bob Hanlon Chantilly, VA USA