Re: The D'Agostino Pearson k^2 test implemented in mathematica / variance of difference sign test
- To: mathgroup at smc.vnet.net
- Subject: [mg65018] Re: [mg64990] The D'Agostino Pearson k^2 test implemented in mathematica / variance of difference sign test
- From: Darren Glosemeyer <darreng at wolfram.com>
- Date: Sat, 11 Mar 2006 05:16:06 -0500 (EST)
- Sender: owner-wri-mathgroup at wolfram.com
Here is an implementation of the D`Agostino Pearson K^2 test based on the discussion in R. B. D'Agostino and M. A. Stephens, editors. Goodness-of-Fit Techniques. Marcel Dekker, Inc., 1986. In[1]:= << Statistics` (* This defines the SU standard normal approximation for sample skewness given on page 377 of D`Agostino and Stephens. *) In[2]:= susqrtb1[data_, len_] := Block[{y, beta2, wsq, delta, alpha}, y = Skewness[data] Sqrt[(len + 1) (len + 3)/(6 (len - 2))]; beta2 = 3 (len^2 + 27 len - 70) (len + 1) (len + 3)/((len - 2) (len + 5) (len + 7) (len + 9)); wsq = Sqrt[2 (beta2 - 1)] - 1; delta = 1/Sqrt[Log[wsq]]; (* In the text, alpha is Sqrt[2/(wsq - 1)], but that appears to be a typo. alpha = Sqrt[1/(wsq - 1)] gives values consistent with a standard normal. *) alpha = Sqrt[1/(wsq - 1)]; delta*Log[y/alpha + Sqrt[(y/alpha)^2 + 1]]] (* This defines the Anscombe Glynn standard normal approximation for sample kurtosis given on page 388 of D`Agostino and Stephens. *) In[3]:= agb2[data_, len_] := Block[{b2, eb2, sdb2, xx, sqrtbeta1b2, aa}, b2 = Kurtosis[data]; eb2 = 3 (len - 1)/(len + 1); sdb2 = Sqrt[24 len (len - 2) (len - 3)/((len + 1)^2 (len + 3) (len + 5))]; xx = (b2 - eb2)/sdb2; sqrtbeta1b2 = 6 (len^2 - 5 len + 2)/((len + 7) (len + 9))* Sqrt[6 (len + 3) (len + 5)/(len (len - 2) (len - 3))]; aa = 6 + 8/sqrtbeta1b2 (2/sqrtbeta1b2 + Sqrt[1 + 4/sqrtbeta1b2^2]); (1 - 2/(9 aa) - ((1 - 2/aa)/(1 + xx*Sqrt[2/(aa - 4)]))^(1/3))/ Sqrt[2/(9 aa)]] (* The D`Agostino and Pearson K^2 test statistic is the sum of squares of the normal approximations to sample skewness and kurtosis. D`Agostino and Stephens states (on page 391) that K^2 is approximately distributed ChiSquareDistribution[2] and that the chi square approximation presents no problems for n >= 100. *) In[4]:= DAgostinoPearsonKSquareTest[data_] := Block[{len = Length[data], ksqrstat}, ksqrstat = susqrtb1[data, len]^2 + agb2[data, len]^2; {KSquareStatistic -> ksqrstat, PValue -> 1 - CDF[ChiSquareDistribution[2], ksqrstat]}] (* Normally distributed data should tend to have pvalues that are not too small. *) In[5]:= DAgostinoPearsonKSquareTest[RandomArray[NormalDistribution[3, 1], 1000]] Out[5]= {KSquareStatistic -> 2.29967, PValue -> 0.316688} (* Data that are far from normal will have small pvalues. *) In[6]:= DAgostinoPearsonKSquareTest[Table[Random[Real, {1, 10}], {1000}]] Out[6]= {KSquareStatistic -> 582.761, PValue -> 0.} For the variance quoted on the TimeSeries page, I initially thought the same thing you did. Assuming the signs are independent and there are equal probabilities of getting positive and negative signs (and 0 probability of getting a 0 difference), the statistic would follow BinomialDistribution[n-1, 1/2], which would have a variance of (n-1)/4. Simulations give a variance that appears to be (n+1)/12 (which would still indicate a typo in the TimeSeries documentation). I haven't figured out why this should be the variance yet. My best guess is that the assumption of independence is not valid given the differencing and as a result the distribution is something other than BinomialDistribution[n-1, 1/2]. Darren Glosemeyer Wolfram Research At 05:15 AM 3/10/2006 -0500, john.hawkin at gmail.com wrote: >Hello, > >I have two questions. > >1. Are there any resources of .nb files available on the internet >where I might find an implementation of the D'Agostino Pearson k^2 test >for normal variates? > >2. In the mathematica time series package (an add-on), the >"difference-sign" test of residuals is mentioned (url: >http://documents.wolfram.com/applications/timeseries/UsersGuidetoTimeSeries/1.6.2.html). > It says that the variance of this test is (n+1) / 2. However, it >would seem to me that a simple calculation gives a variance of (n-1)/4. > It goes as follows: > >If the series is differenced once, then the number of positive and >negative values in the difference should be approximately equal. If Xi >denotes the sign of each value in the differenced series, then >Mean(Xi) = 0.5(1) + 0.5(0) = 0.5 >Var(Xi) = Expectation( (Xi - Mean(Xi))^2 ) >= Expectation( Xi^2 -Xi + 0.25 ) >= 0.5 - 0.5 + 0.25 >= 0.25 > >And assuming independence of each sign from the others, the total >variance should be the sum of the individual variances, up to n-1 for n >data points (since there are only n-1 changes in sign), thus > >Variance = (n-1) / 4 > >There is an equivalent problem in Lemon's "Stochastic Physics" about >coin flips, for which the answer is listed, without proof, as (n-1)/8. >Because of these three conficting results I am wondering if I have made >an error in my calculation, and if anyone can find one please let me >know. > >Thank you very much, > >-John Hawkin