Re: Problems with DistributionFitTest
- To: mathgroup at smc.vnet.net
- Subject: [mg122639] Re: Problems with DistributionFitTest
- From: DrMajorBob <btreat1 at austin.rr.com>
- Date: Fri, 4 Nov 2011 06:00:36 -0500 (EST)
- Delivered-to: l-mathgroup@mail-archive0.wolfram.com
- References: <201111010502.AAA14754@smc.vnet.net>
- Reply-to: drmajorbob at yahoo.com
I like this one: numTests = 1000; First@Timing[ tests = Flatten[{{{0, 1}}, Sort@Table[{t = DistributionFitTest@ RandomVariate[NormalDistribution[], 10000], Boole[t <= .05]}, {numTests}]}, 1]; f = Interpolation@ Thread[{Range[0, numTests]/numTests, tests[[All, 1]]}]; Print@Plot[f@x - x, {x, 0, 1}, ImageSize -> 400]; Print@N@Mean@Rest@tests ] "seconds" Bobby On Thu, 03 Nov 2011 03:46:19 -0500, Barrie Stokes <Barrie.Stokes at newcastle.edu.au> wrote: > Hi Felipe > > Can I beg to make a small clarification to Andy's response? > > The whole idea of p values and rejection of the Null Hypothesis > continues to be one in which people get tangled up in logical and > linguistic knots. > > An observed p value of does *not* allow one to make a *general* claim > like "about 3% of the time you can expect to get a test statistic like > the one you obtained or one even more extreme". > > Given the context of this p value, it's value being 0.0312946, i.e., > less than 0.05, allows a frequentist-classical statistician to say that, > *on this occasion*, this observed p value enables me to reject the Null > Hypothesis (which is that the data are Gaussian) at the 5% significance > level, or some such equivalent phrase. > > The important thing here is that, *by construction*, p values are equal > to or less than 0.05 precisely 5% of the time *when the Null Hypothesis > holds, i.e., is in fact true*, or "under the Null Hypothesis", as it's > usually phrased. > > When one rejects the Null Hypothesis (having obtained a p value <=0.05, > one is in fact betting that, in so doing, you will only be wrong in so > doing 1 time in 20. > > If anyone doesn't like this explication, please note that I am a > Bayesian, s for me to explain a p value is like George Bush explaining > the meaning of the French word 'entrepreneur'. :-) > > (Apparently GB once claimed that the trouble with the French is that > they don't have a word for 'entrepreneur'. Actually, they do.) > > You may find the following code (built on your original code) helpful - > run it as many times as your patience allows. > > numTests = 1000; > resultsList = {}; > Do[ > (data = RandomVariate[NormalDistribution[], 10000]; > AppendTo[ resultsList, DistributionFitTest[data] ]; > ), {numTests} > ] > resultsList // Short > Length[ Select[ resultsList, (s \[Function] s <= 0.05) ] ]/numTests // N > > Cheers > > Barrie > > > >>>> On 02/11/2011 at 10:23 pm, in message >>>> <201111021123.GAA03608 at smc.vnet.net>, > Andy Ross <andyr at wolfram.com> wrote: >> This is exactly what you might expect. The p-value from a hypothesis >> test is itself a random variable. Under the null hypothesis the p-value >> should follow a UniformDistribution[{0,1}]. >> >> In your case, the null hypothesis is that the data have been drawn from >> a normal distribution. What that p-value is really saying is that about >> 3% of the time you can expect to get a test statistic like the one you >> obtained or one even more extreme. >> >> Andy Ross >> Wolfram Research >> >> >> On 11/1/2011 12:02 AM, fd wrote: >>> Dear Group >>> >>> I'm not a specialist in statistics, but I spoke to one who found this >>> behaviour dubious. >>> >>> Before using DistributionFitTest I was doing some tests with the >>> normal distribution, like this >>> >>> data = RandomVariate[NormalDistribution[], 10000]; >>> >>> DistributionFitTest[data] >>> >>> 0.0312946 >>> >>> According to the documentation "A small p-value suggests that it is >>> unlikely that the data came from dist", and that the test assumes the >>> data is normally distributed >>> >>> I found this result for the p-value to be really low, if I re-run the >>> code I often get what I would expect (a number greater than 0.5) but >>> it is not at all rare to obtain p values smaller than 0.05 and even >>> smaller. Through multiple re-runs I notice it fluctuates by orders of >>> magnitude. >>> >>> The statistician I consulted with found this weird since the data was >>> drawn from a a normal distribution and the sample size is big, >>> especially because the Pearson X2 test also fluctuates like this: >>> >>> H=DistributionFitTest[data, Automatic, "HypothesisTestData"]; >>> >>> H["TestDataTable", All] >>> >>> Is this a real issue? >>> >>> Any thougths >>> >>> Best regards >>> Felipe >>> >>> >>> >>> > > -- DrMajorBob at yahoo.com