Re: Problems with DistributionFitTest
- To: mathgroup at smc.vnet.net
- Subject: [mg122639] Re: Problems with DistributionFitTest
- From: DrMajorBob <btreat1 at austin.rr.com>
- Date: Fri, 4 Nov 2011 06:00:36 -0500 (EST)
- Delivered-to: l-mathgroup@mail-archive0.wolfram.com
- References: <201111010502.AAA14754@smc.vnet.net>
- Reply-to: drmajorbob at yahoo.com
I like this one:
numTests = 1000;
First@Timing[
tests = Flatten[{{{0, 1}},
Sort@Table[{t =
DistributionFitTest@
RandomVariate[NormalDistribution[], 10000],
Boole[t <= .05]}, {numTests}]}, 1];
f = Interpolation@
Thread[{Range[0, numTests]/numTests, tests[[All, 1]]}];
Print@Plot[f@x - x, {x, 0, 1}, ImageSize -> 400];
Print@N@Mean@Rest@tests
] "seconds"
Bobby
On Thu, 03 Nov 2011 03:46:19 -0500, Barrie Stokes
<Barrie.Stokes at newcastle.edu.au> wrote:
> Hi Felipe
>
> Can I beg to make a small clarification to Andy's response?
>
> The whole idea of p values and rejection of the Null Hypothesis
> continues to be one in which people get tangled up in logical and
> linguistic knots.
>
> An observed p value of does *not* allow one to make a *general* claim
> like "about 3% of the time you can expect to get a test statistic like
> the one you obtained or one even more extreme".
>
> Given the context of this p value, it's value being 0.0312946, i.e.,
> less than 0.05, allows a frequentist-classical statistician to say that,
> *on this occasion*, this observed p value enables me to reject the Null
> Hypothesis (which is that the data are Gaussian) at the 5% significance
> level, or some such equivalent phrase.
>
> The important thing here is that, *by construction*, p values are equal
> to or less than 0.05 precisely 5% of the time *when the Null Hypothesis
> holds, i.e., is in fact true*, or "under the Null Hypothesis", as it's
> usually phrased.
>
> When one rejects the Null Hypothesis (having obtained a p value <=0.05,
> one is in fact betting that, in so doing, you will only be wrong in so
> doing 1 time in 20.
>
> If anyone doesn't like this explication, please note that I am a
> Bayesian, s for me to explain a p value is like George Bush explaining
> the meaning of the French word 'entrepreneur'. :-)
>
> (Apparently GB once claimed that the trouble with the French is that
> they don't have a word for 'entrepreneur'. Actually, they do.)
>
> You may find the following code (built on your original code) helpful -
> run it as many times as your patience allows.
>
> numTests = 1000;
> resultsList = {};
> Do[
> (data = RandomVariate[NormalDistribution[], 10000];
> AppendTo[ resultsList, DistributionFitTest[data] ];
> ), {numTests}
> ]
> resultsList // Short
> Length[ Select[ resultsList, (s \[Function] s <= 0.05) ] ]/numTests // N
>
> Cheers
>
> Barrie
>
>
>
>>>> On 02/11/2011 at 10:23 pm, in message
>>>> <201111021123.GAA03608 at smc.vnet.net>,
> Andy Ross <andyr at wolfram.com> wrote:
>> This is exactly what you might expect. The p-value from a hypothesis
>> test is itself a random variable. Under the null hypothesis the p-value
>> should follow a UniformDistribution[{0,1}].
>>
>> In your case, the null hypothesis is that the data have been drawn from
>> a normal distribution. What that p-value is really saying is that about
>> 3% of the time you can expect to get a test statistic like the one you
>> obtained or one even more extreme.
>>
>> Andy Ross
>> Wolfram Research
>>
>>
>> On 11/1/2011 12:02 AM, fd wrote:
>>> Dear Group
>>>
>>> I'm not a specialist in statistics, but I spoke to one who found this
>>> behaviour dubious.
>>>
>>> Before using DistributionFitTest I was doing some tests with the
>>> normal distribution, like this
>>>
>>> data = RandomVariate[NormalDistribution[], 10000];
>>>
>>> DistributionFitTest[data]
>>>
>>> 0.0312946
>>>
>>> According to the documentation "A small p-value suggests that it is
>>> unlikely that the data came from dist", and that the test assumes the
>>> data is normally distributed
>>>
>>> I found this result for the p-value to be really low, if I re-run the
>>> code I often get what I would expect (a number greater than 0.5) but
>>> it is not at all rare to obtain p values smaller than 0.05 and even
>>> smaller. Through multiple re-runs I notice it fluctuates by orders of
>>> magnitude.
>>>
>>> The statistician I consulted with found this weird since the data was
>>> drawn from a a normal distribution and the sample size is big,
>>> especially because the Pearson X2 test also fluctuates like this:
>>>
>>> H=DistributionFitTest[data, Automatic, "HypothesisTestData"];
>>>
>>> H["TestDataTable", All]
>>>
>>> Is this a real issue?
>>>
>>> Any thougths
>>>
>>> Best regards
>>> Felipe
>>>
>>>
>>>
>>>
>
>
--
DrMajorBob at yahoo.com