Re: Problems with DistributionFitTest

*To*: mathgroup at smc.vnet.net*Subject*: [mg122588] Re: Problems with DistributionFitTest*From*: DrMajorBob <btreat1 at austin.rr.com>*Date*: Wed, 2 Nov 2011 06:22:39 -0500 (EST)*Delivered-to*: l-mathgroup@mail-archive0.wolfram.com*References*: <201111010502.AAA14754@smc.vnet.net>*Reply-to*: drmajorbob at yahoo.com

A statistical test at the 5% level should (by definition) reject the null hypothesis 5% of the time when the null hypothesis is true. This is called Type 1 error. For example: n = 10^4; test := DistributionFitTest[RandomVariate[NormalDistribution[], n]] N@Count[Sort@Table[test, {1000}], x_ /; x < .05]/1000 0.063 N@Count[Sort@Table[test, {1000}], x_ /; x < .05]/1000 0.071 N@Count[Sort@Table[test, {1000}], x_ /; x < .05]/1000 0.049 In three tests of 1000 trials each (each with 10000 random variates), Type 1 errors ocurred 6.3%, 7.1%, and 4.9% of the time. Here's an instructive plot, as well: ListPlot@Sort@Table[test, {1000}] It looks much like a plot of y = x, just as it's supposed to do. It is said that statisticians (especially social scientists) are doomed to publish their type 1 errors. Hence, a lot of published papers state wrong conclusions. Type 2 errors are another can of worms, since it's generally unknown how frequent they might be. Bobby On Tue, 01 Nov 2011 00:02:38 -0500, fd <fdimer at gmail.com> wrote: > Dear Group > > I'm not a specialist in statistics, but I spoke to one who found this > behaviour dubious. > > Before using DistributionFitTest I was doing some tests with the > normal distribution, like this > > data = RandomVariate[NormalDistribution[], 10000]; > > DistributionFitTest[data] > > 0.0312946 > > According to the documentation "A small p-value suggests that it is > unlikely that the data came from dist", and that the test assumes the > data is normally distributed > > I found this result for the p-value to be really low, if I re-run the > code I often get what I would expect (a number greater than 0.5) but > it is not at all rare to obtain p values smaller than 0.05 and even > smaller. Through multiple re-runs I notice it fluctuates by orders of > magnitude. > > The statistician I consulted with found this weird since the data was > drawn from a a normal distribution and the sample size is big, > especially because the Pearson X2 test also fluctuates like this: > > H=DistributionFitTest[data, Automatic, "HypothesisTestData"]; > > H["TestDataTable", All] > > Is this a real issue? > > Any thougths > > Best regards > Felipe > > > > -- DrMajorBob at yahoo.com