MathGroup Archive: November 2011 [00021]

[Date Index] [Thread Index] [Author Index]

Re: Problems with DistributionFitTest

To: mathgroup at smc.vnet.net
Subject: [mg122581] Re: Problems with DistributionFitTest
From: Felipe Dimer de Oliveira <fdimer at gmail.com>
Date: Wed, 2 Nov 2011 06:21:22 -0500 (EST)
Delivered-to: l-mathgroup@mail-archive0.wolfram.com
References: <201111010502.AAA14754@smc.vnet.net> <op.v39qs0xotgfoz2@bobbys-imac.local>

Thanks for that, very enlightening

Best
Felipe

On 02/11/2011, at 12:08 AM, DrMajorBob wrote:

> A statistical test at the 5% level should (by definition) reject the 
null hypothesis 5% of the time when the null hypothesis is true. This is 
called Type 1 error.
>
> For example:
>
> n = 10^4;
> test := DistributionFitTest[RandomVariate[NormalDistribution[], n]]
> N@Count[Sort@Table[test, {1000}], x_ /; x < .05]/1000
>
> 0.063
>
> N@Count[Sort@Table[test, {1000}], x_ /; x < .05]/1000
>
> 0.071
>
> N@Count[Sort@Table[test, {1000}], x_ /; x < .05]/1000
>
> 0.049
>
> In three tests of 1000 trials each (each with 10000 random variates), Type 1 errors ocurred 6.3%, 7.1%, and 4.9% of the time.
>
> Here's an instructive plot, as well:
>
> ListPlot@Sort@Table[test, {1000}]
>
> It looks much like a plot of y = x, just as it's supposed to do.
>
> It is said that statisticians (especially social scientists) are doomed to publish their type 1 errors. Hence, a lot of published papers state wrong conclusions.
>
> Type 2 errors are another can of worms, since it's generally unknown how frequent they might be.
>
> Bobby
>
> On Tue, 01 Nov 2011 00:02:38 -0500, fd <fdimer at gmail.com> wrote:
>
>> Dear Group
>>
>> I'm not a specialist in statistics, but I spoke to one who found this
>> behaviour dubious.
>>
>> Before using DistributionFitTest I was doing some tests with the
>> normal distribution, like this
>>
>> data = RandomVariate[NormalDistribution[], 10000];
>>
>> DistributionFitTest[data]
>>
>> 0.0312946
>>
>> According to the documentation "A small p-value suggests that it is
>> unlikely that the data came from dist", and that the test assumes the
>> data is normally distributed
>>
>> I found this result for the p-value to be really low, if I re-run the
>> code I often get what I would expect (a number greater than 0.5) but
>> it is not at all rare to obtain p values smaller than 0.05 and even
>> smaller. Through multiple re-runs I notice it fluctuates by orders of
>> magnitude.
>>
>> The statistician I consulted with found this weird since the data was
>> drawn from a a normal distribution and the sample size is big,
>> especially because the Pearson X2 test also fluctuates like this:
>>
>> H=DistributionFitTest[data, Automatic, "HypothesisTestData"];
>>
>> H["TestDataTable", All]
>>
>> Is this a real issue?
>>
>> Any thougths
>>
>> Best regards
>> Felipe
>>
>>
>>
>>
>
>
> --
> DrMajorBob at yahoo.com

Prev by Date: Re: Putting Points and Lines into RegionPlot3D

Next by Date: Re: nVidia Optumus prevents using CUDA?

Previous by thread: Re: Problems with DistributionFitTest

Next by thread: Re: Problems with DistributionFitTest