MathGroup Archive: November 2011 [00053]

[Date Index] [Thread Index] [Author Index]

Re: Problems with DistributionFitTest

To: mathgroup at smc.vnet.net
Subject: [mg122611] Re: Problems with DistributionFitTest
From: Barrie Stokes <Barrie.Stokes at newcastle.edu.au>
Date: Thu, 3 Nov 2011 03:46:19 -0500 (EST)
Delivered-to: l-mathgroup@mail-archive0.wolfram.com
References: <201111010502.AAA14754@smc.vnet.net>

Hi Felipe

Can I beg to make a small clarification to Andy's response?

The whole idea of p values and rejection of the Null Hypothesis continues to be one in which people get tangled up in logical and linguistic knots.

An observed p value of does *not* allow one to make a *general* claim like "about 3% of the time you can expect to get a test statistic like the one you obtained or one even more extreme".

Given the context of this p value, it's value being  0.0312946, i.e., less than 0.05, allows a frequentist-classical statistician to say that, *on this occasion*, this observed p value enables me to reject the Null Hypothesis (which is that the data are Gaussian) at the 5% significance level, or some such equivalent phrase.

The important thing here is that, *by construction*, p values are equal to or less than 0.05 precisely 5% of the time *when the Null Hypothesis holds, i.e., is in fact true*, or "under the Null Hypothesis", as it's usually phrased.

When one rejects the Null Hypothesis (having obtained a p value <=0.05, one is in fact betting that, in so doing, you will only be wrong in so doing 1 time in 20.

If anyone doesn't like this explication, please note that I am a Bayesian, s for me to explain a p value is like George Bush explaining the meaning of the French word 'entrepreneur'.  :-)

(Apparently GB once claimed that the trouble with the French is that they don't have a word for 'entrepreneur'. Actually, they do.)

You may find the following code (built on your original code) helpful - run it as many times as your patience allows.

numTests = 1000;
resultsList = {};
Do[
 (data = RandomVariate[NormalDistribution[], 10000];
  AppendTo[ resultsList, DistributionFitTest[data] ];
  ), {numTests}
 ]
resultsList // Short
Length[ Select[ resultsList, (s \[Function] s <= 0.05) ] ]/numTests //  N 

Cheers

Barrie

>>> On 02/11/2011 at 10:23 pm, in message <201111021123.GAA03608 at smc.vnet.net>,
Andy Ross <andyr at wolfram.com> wrote:
> This is exactly what you might expect.  The p-value from a hypothesis 
> test is itself a random variable. Under the null hypothesis the p-value 
> should follow a UniformDistribution[{0,1}].
> 
> In your case, the null hypothesis is that the data have been drawn from 
> a normal distribution. What that p-value is really saying is that about 
> 3% of the time you can expect to get a test statistic like the one you 
> obtained or one even more extreme.
> 
> Andy Ross
> Wolfram Research
> 
> 
> On 11/1/2011 12:02 AM, fd wrote:
>> Dear Group
>>
>> I'm not a specialist in statistics, but I spoke to one who found this
>> behaviour dubious.
>>
>> Before using DistributionFitTest I was doing some tests with the
>> normal distribution, like this
>>
>> data = RandomVariate[NormalDistribution[], 10000];
>>
>> DistributionFitTest[data]
>>
>> 0.0312946
>>
>> According to the documentation "A small p-value suggests that it is
>> unlikely that the data came from dist", and that the test assumes the
>> data is normally distributed
>>
>> I found this result for the p-value to be really low, if I re-run the
>> code I often get what I would expect (a number greater than 0.5) but
>> it is not at all rare to obtain p values smaller than 0.05 and even
>> smaller. Through multiple re-runs I notice it fluctuates by orders of
>> magnitude.
>>
>> The statistician I consulted with found this weird since the data was
>> drawn from a a normal distribution and the sample size is big,
>> especially because the Pearson X2 test also fluctuates like this:
>>
>> H=DistributionFitTest[data, Automatic, "HypothesisTestData"];
>>
>> H["TestDataTable", All]
>>
>> Is this a real issue?
>>
>> Any thougths
>>
>> Best regards
>> Felipe
>>
>>
>>
>>

Prev by Date: Re: Question: 2 z-Axis scales in Plot3D?

Next by Date: Exit a loop

Previous by thread: Re: Problems with DistributionFitTest

Next by thread: Re: Problems with DistributionFitTest