Re: Problems with DistributionFitTest
- To: mathgroup at smc.vnet.net
- Subject: [mg122588] Re: Problems with DistributionFitTest
- From: DrMajorBob <btreat1 at austin.rr.com>
- Date: Wed, 2 Nov 2011 06:22:39 -0500 (EST)
- Delivered-to: l-mathgroup@mail-archive0.wolfram.com
- References: <201111010502.AAA14754@smc.vnet.net>
- Reply-to: drmajorbob at yahoo.com
A statistical test at the 5% level should (by definition) reject the null
hypothesis 5% of the time when the null hypothesis is true. This is called
Type 1 error.
For example:
n = 10^4;
test := DistributionFitTest[RandomVariate[NormalDistribution[], n]]
N@Count[Sort@Table[test, {1000}], x_ /; x < .05]/1000
0.063
N@Count[Sort@Table[test, {1000}], x_ /; x < .05]/1000
0.071
N@Count[Sort@Table[test, {1000}], x_ /; x < .05]/1000
0.049
In three tests of 1000 trials each (each with 10000 random variates), Type
1 errors ocurred 6.3%, 7.1%, and 4.9% of the time.
Here's an instructive plot, as well:
ListPlot@Sort@Table[test, {1000}]
It looks much like a plot of y = x, just as it's supposed to do.
It is said that statisticians (especially social scientists) are doomed to
publish their type 1 errors. Hence, a lot of published papers state wrong
conclusions.
Type 2 errors are another can of worms, since it's generally unknown how
frequent they might be.
Bobby
On Tue, 01 Nov 2011 00:02:38 -0500, fd <fdimer at gmail.com> wrote:
> Dear Group
>
> I'm not a specialist in statistics, but I spoke to one who found this
> behaviour dubious.
>
> Before using DistributionFitTest I was doing some tests with the
> normal distribution, like this
>
> data = RandomVariate[NormalDistribution[], 10000];
>
> DistributionFitTest[data]
>
> 0.0312946
>
> According to the documentation "A small p-value suggests that it is
> unlikely that the data came from dist", and that the test assumes the
> data is normally distributed
>
> I found this result for the p-value to be really low, if I re-run the
> code I often get what I would expect (a number greater than 0.5) but
> it is not at all rare to obtain p values smaller than 0.05 and even
> smaller. Through multiple re-runs I notice it fluctuates by orders of
> magnitude.
>
> The statistician I consulted with found this weird since the data was
> drawn from a a normal distribution and the sample size is big,
> especially because the Pearson X2 test also fluctuates like this:
>
> H=DistributionFitTest[data, Automatic, "HypothesisTestData"];
>
> H["TestDataTable", All]
>
> Is this a real issue?
>
> Any thougths
>
> Best regards
> Felipe
>
>
>
>
--
DrMajorBob at yahoo.com