Re: Problems with DistributionFitTest
- To: mathgroup at smc.vnet.net
- Subject: [mg122581] Re: Problems with DistributionFitTest
- From: Felipe Dimer de Oliveira <fdimer at gmail.com>
- Date: Wed, 2 Nov 2011 06:21:22 -0500 (EST)
- Delivered-to: l-mathgroup@mail-archive0.wolfram.com
- References: <201111010502.AAA14754@smc.vnet.net> <op.v39qs0xotgfoz2@bobbys-imac.local>
Thanks for that, very enlightening Best Felipe On 02/11/2011, at 12:08 AM, DrMajorBob wrote: > A statistical test at the 5% level should (by definition) reject the null hypothesis 5% of the time when the null hypothesis is true. This is called Type 1 error. > > For example: > > n = 10^4; > test := DistributionFitTest[RandomVariate[NormalDistribution[], n]] > N@Count[Sort@Table[test, {1000}], x_ /; x < .05]/1000 > > 0.063 > > N@Count[Sort@Table[test, {1000}], x_ /; x < .05]/1000 > > 0.071 > > N@Count[Sort@Table[test, {1000}], x_ /; x < .05]/1000 > > 0.049 > > In three tests of 1000 trials each (each with 10000 random variates), Type 1 errors ocurred 6.3%, 7.1%, and 4.9% of the time. > > Here's an instructive plot, as well: > > ListPlot@Sort@Table[test, {1000}] > > It looks much like a plot of y = x, just as it's supposed to do. > > It is said that statisticians (especially social scientists) are doomed to publish their type 1 errors. Hence, a lot of published papers state wrong conclusions. > > Type 2 errors are another can of worms, since it's generally unknown how frequent they might be. > > Bobby > > On Tue, 01 Nov 2011 00:02:38 -0500, fd <fdimer at gmail.com> wrote: > >> Dear Group >> >> I'm not a specialist in statistics, but I spoke to one who found this >> behaviour dubious. >> >> Before using DistributionFitTest I was doing some tests with the >> normal distribution, like this >> >> data = RandomVariate[NormalDistribution[], 10000]; >> >> DistributionFitTest[data] >> >> 0.0312946 >> >> According to the documentation "A small p-value suggests that it is >> unlikely that the data came from dist", and that the test assumes the >> data is normally distributed >> >> I found this result for the p-value to be really low, if I re-run the >> code I often get what I would expect (a number greater than 0.5) but >> it is not at all rare to obtain p values smaller than 0.05 and even >> smaller. Through multiple re-runs I notice it fluctuates by orders of >> magnitude. >> >> The statistician I consulted with found this weird since the data was >> drawn from a a normal distribution and the sample size is big, >> especially because the Pearson X2 test also fluctuates like this: >> >> H=DistributionFitTest[data, Automatic, "HypothesisTestData"]; >> >> H["TestDataTable", All] >> >> Is this a real issue? >> >> Any thougths >> >> Best regards >> Felipe >> >> >> >> > > > -- > DrMajorBob at yahoo.com