Re: Problems with DistributionFitTest
- To: mathgroup at smc.vnet.net
- Subject: [mg122611] Re: Problems with DistributionFitTest
- From: Barrie Stokes <Barrie.Stokes at newcastle.edu.au>
- Date: Thu, 3 Nov 2011 03:46:19 -0500 (EST)
- Delivered-to: l-mathgroup@mail-archive0.wolfram.com
- References: <201111010502.AAA14754@smc.vnet.net>
Hi Felipe
Can I beg to make a small clarification to Andy's response?
The whole idea of p values and rejection of the Null Hypothesis continues to be one in which people get tangled up in logical and linguistic knots.
An observed p value of does *not* allow one to make a *general* claim like "about 3% of the time you can expect to get a test statistic like the one you obtained or one even more extreme".
Given the context of this p value, it's value being 0.0312946, i.e., less than 0.05, allows a frequentist-classical statistician to say that, *on this occasion*, this observed p value enables me to reject the Null Hypothesis (which is that the data are Gaussian) at the 5% significance level, or some such equivalent phrase.
The important thing here is that, *by construction*, p values are equal to or less than 0.05 precisely 5% of the time *when the Null Hypothesis holds, i.e., is in fact true*, or "under the Null Hypothesis", as it's usually phrased.
When one rejects the Null Hypothesis (having obtained a p value <=0.05, one is in fact betting that, in so doing, you will only be wrong in so doing 1 time in 20.
If anyone doesn't like this explication, please note that I am a Bayesian, s for me to explain a p value is like George Bush explaining the meaning of the French word 'entrepreneur'. :-)
(Apparently GB once claimed that the trouble with the French is that they don't have a word for 'entrepreneur'. Actually, they do.)
You may find the following code (built on your original code) helpful - run it as many times as your patience allows.
numTests = 1000;
resultsList = {};
Do[
(data = RandomVariate[NormalDistribution[], 10000];
AppendTo[ resultsList, DistributionFitTest[data] ];
), {numTests}
]
resultsList // Short
Length[ Select[ resultsList, (s \[Function] s <= 0.05) ] ]/numTests // N
Cheers
Barrie
>>> On 02/11/2011 at 10:23 pm, in message <201111021123.GAA03608 at smc.vnet.net>,
Andy Ross <andyr at wolfram.com> wrote:
> This is exactly what you might expect. The p-value from a hypothesis
> test is itself a random variable. Under the null hypothesis the p-value
> should follow a UniformDistribution[{0,1}].
>
> In your case, the null hypothesis is that the data have been drawn from
> a normal distribution. What that p-value is really saying is that about
> 3% of the time you can expect to get a test statistic like the one you
> obtained or one even more extreme.
>
> Andy Ross
> Wolfram Research
>
>
> On 11/1/2011 12:02 AM, fd wrote:
>> Dear Group
>>
>> I'm not a specialist in statistics, but I spoke to one who found this
>> behaviour dubious.
>>
>> Before using DistributionFitTest I was doing some tests with the
>> normal distribution, like this
>>
>> data = RandomVariate[NormalDistribution[], 10000];
>>
>> DistributionFitTest[data]
>>
>> 0.0312946
>>
>> According to the documentation "A small p-value suggests that it is
>> unlikely that the data came from dist", and that the test assumes the
>> data is normally distributed
>>
>> I found this result for the p-value to be really low, if I re-run the
>> code I often get what I would expect (a number greater than 0.5) but
>> it is not at all rare to obtain p values smaller than 0.05 and even
>> smaller. Through multiple re-runs I notice it fluctuates by orders of
>> magnitude.
>>
>> The statistician I consulted with found this weird since the data was
>> drawn from a a normal distribution and the sample size is big,
>> especially because the Pearson X2 test also fluctuates like this:
>>
>> H=DistributionFitTest[data, Automatic, "HypothesisTestData"];
>>
>> H["TestDataTable", All]
>>
>> Is this a real issue?
>>
>> Any thougths
>>
>> Best regards
>> Felipe
>>
>>
>>
>>