Re: Problems with DistributionFitTest
- To: mathgroup at smc.vnet.net
- Subject: [mg122611] Re: Problems with DistributionFitTest
- From: Barrie Stokes <Barrie.Stokes at newcastle.edu.au>
- Date: Thu, 3 Nov 2011 03:46:19 -0500 (EST)
- Delivered-to: l-mathgroup@mail-archive0.wolfram.com
- References: <201111010502.AAA14754@smc.vnet.net>
Hi Felipe Can I beg to make a small clarification to Andy's response? The whole idea of p values and rejection of the Null Hypothesis continues to be one in which people get tangled up in logical and linguistic knots. An observed p value of does *not* allow one to make a *general* claim like "about 3% of the time you can expect to get a test statistic like the one you obtained or one even more extreme". Given the context of this p value, it's value being 0.0312946, i.e., less than 0.05, allows a frequentist-classical statistician to say that, *on this occasion*, this observed p value enables me to reject the Null Hypothesis (which is that the data are Gaussian) at the 5% significance level, or some such equivalent phrase. The important thing here is that, *by construction*, p values are equal to or less than 0.05 precisely 5% of the time *when the Null Hypothesis holds, i.e., is in fact true*, or "under the Null Hypothesis", as it's usually phrased. When one rejects the Null Hypothesis (having obtained a p value <=0.05, one is in fact betting that, in so doing, you will only be wrong in so doing 1 time in 20. If anyone doesn't like this explication, please note that I am a Bayesian, s for me to explain a p value is like George Bush explaining the meaning of the French word 'entrepreneur'. :-) (Apparently GB once claimed that the trouble with the French is that they don't have a word for 'entrepreneur'. Actually, they do.) You may find the following code (built on your original code) helpful - run it as many times as your patience allows. numTests = 1000; resultsList = {}; Do[ (data = RandomVariate[NormalDistribution[], 10000]; AppendTo[ resultsList, DistributionFitTest[data] ]; ), {numTests} ] resultsList // Short Length[ Select[ resultsList, (s \[Function] s <= 0.05) ] ]/numTests // N Cheers Barrie >>> On 02/11/2011 at 10:23 pm, in message <201111021123.GAA03608 at smc.vnet.net>, Andy Ross <andyr at wolfram.com> wrote: > This is exactly what you might expect. The p-value from a hypothesis > test is itself a random variable. Under the null hypothesis the p-value > should follow a UniformDistribution[{0,1}]. > > In your case, the null hypothesis is that the data have been drawn from > a normal distribution. What that p-value is really saying is that about > 3% of the time you can expect to get a test statistic like the one you > obtained or one even more extreme. > > Andy Ross > Wolfram Research > > > On 11/1/2011 12:02 AM, fd wrote: >> Dear Group >> >> I'm not a specialist in statistics, but I spoke to one who found this >> behaviour dubious. >> >> Before using DistributionFitTest I was doing some tests with the >> normal distribution, like this >> >> data = RandomVariate[NormalDistribution[], 10000]; >> >> DistributionFitTest[data] >> >> 0.0312946 >> >> According to the documentation "A small p-value suggests that it is >> unlikely that the data came from dist", and that the test assumes the >> data is normally distributed >> >> I found this result for the p-value to be really low, if I re-run the >> code I often get what I would expect (a number greater than 0.5) but >> it is not at all rare to obtain p values smaller than 0.05 and even >> smaller. Through multiple re-runs I notice it fluctuates by orders of >> magnitude. >> >> The statistician I consulted with found this weird since the data was >> drawn from a a normal distribution and the sample size is big, >> especially because the Pearson X2 test also fluctuates like this: >> >> H=DistributionFitTest[data, Automatic, "HypothesisTestData"]; >> >> H["TestDataTable", All] >> >> Is this a real issue? >> >> Any thougths >> >> Best regards >> Felipe >> >> >> >>