calculating a distribution of separation of two means
- To: mathgroup at smc.vnet.net
- Subject: [mg29356] calculating a distribution of separation of two means
- From: "Philip M. Howe" <pmhowe at lanl.gov>
- Date: Thu, 14 Jun 2001 02:27:31 -0400 (EDT)
- Sender: owner-wri-mathgroup at wolfram.com
Dear Colleagues, I have a small problem I have been working on, and would like your insights and suggestions: I'd like to calculate two distributions, and then calculate a measure of how the difference in the means vary as a function of the nature of the two distributions. This loads some packages I use. Needs["Statistics`ContinuousDistributions`"]; Needs["Statistics`NormalDistribution`"]; <<NumericalMath`ListIntegrate`; << Graphics`Graphics` <<Statistics`DescriptiveStatistics`; Here are two normal distributions, with means 20 and 15, with some arbitrarily chosen standard deviations. \[Mu]1 = 20; \[Sigma]1 = .1; \[Mu]2 = 15; \[Sigma]2 = 1; d1 = NormalDistribution[\[Mu]1, \[Sigma]1]; d2 = NormalDistribution[\[Mu]2, \[Sigma]2]; Here is a distribution of the differences. A histogram can be made of the distribution of the differences. The histogram would have to be normalized in order to use it as a probability density function. Clear[h1,d3,div]; n=1000; d3=(RandomArray[d1,n]-RandomArray[d2,n]); ListPlot[d3]; h1=Histogram[d3]; Below, I calculate the distribution of the means in the following way: I sample the two distributions n times, and calculate n differences (d3). Naturally, I want to use values of n in the multiple thousands, but I used a small n and left off the semicolons for you to see what the output looks like. Clear[h1,j,d3,t1,t2,t3]; n=10; d3=(RandomArray[d1,n]-RandomArray[d2,n]) I then place these differences into n equal sized bins (t1). This is a very slow step when n is large. The inefficiency is at least partially due to the generation of lots of empty lists. I don't know how to get around it. Suggestions are welcome. t1=Table[{Select[d3,(n-1<#<n)&],n},{n,1,Length[d3]}] I now add up all the values within a bin, and calculate the mean value. I do that for each bin (t2). There is a nice side effect, in that this eliminates all those empty lists. They are still present with zeros in them, so I haven't gained much except, perhaps, esthetics. t2=Table[{t1[[j]][[2]],Plus@@t1[[j]][[1]]},{j,1,Length[t1]}]; div=ListIntegrate[t2,1] I now normalize the table, to be able to treat it as a probability density function (t3). t3 = t2 /. {x_, y_} -> {x, y/div}; pp4 = ListPlot[t3, PlotRange -> {{0, 10}, {0, .40}}, PlotStyle -> PointSize[0.02]]; h3 = Histogram[d3]; margin = \[Mu]1/\[Mu]2 - 1 safetyFactor = \[Mu]1/\[Mu]2 StandardDeviation[d3] Mean[d3] The output looks reasonable to me, and I think it is correct (although, as is probably obvious to you, I am no statistician). My coding makes the calculation quite slow. I'd appreciate suggestions for speed-up. Also, this calculation must be fairly common, and I bet there are much better ways of approaching it. Your input will be appreciated. (I'd also appreciate a better way to send questions like this to you. For example, I placed (* and *) around all my comments, thinking that you could just copy and paste everything into a fresh notebook and run it. When I did this (I'm running v4.1 on a MacIntosh G4), Mathematica picked up some of the (* and *) and ignored the others.) Thanks, Phil -- Philip M. Howe Program Manager, Stockpile Surety Los Alamos National Laboratory (505) 665-5332 (505) 667-9498 Fax: 505-665-5249 email pmhowe at lanl.gov Mail Stop F606