Re: The D'Agostino Pearson k^2 test implemented in mathematica / variance of difference sign test

• To: mathgroup at smc.vnet.net
• Subject: [mg65018] Re: [mg64990] The D'Agostino Pearson k^2 test implemented in mathematica / variance of difference sign test
• From: Darren Glosemeyer <darreng at wolfram.com>
• Date: Sat, 11 Mar 2006 05:16:06 -0500 (EST)
• Sender: owner-wri-mathgroup at wolfram.com

```Here is an implementation of the D`Agostino Pearson K^2 test based on the
discussion in
R. B. D'Agostino and M. A. Stephens, editors. Goodness-of-Fit Techniques.
Marcel Dekker, Inc., 1986.

In[1]:= << Statistics`

(* This defines the SU standard normal approximation for sample skewness
given on page 377 of D`Agostino and Stephens. *)

In[2]:= susqrtb1[data_, len_] := Block[{y, beta2, wsq, delta, alpha},
y = Skewness[data] Sqrt[(len + 1) (len + 3)/(6 (len - 2))];
beta2 = 3 (len^2 + 27 len - 70) (len +
1) (len + 3)/((len - 2) (len + 5) (len + 7) (len + 9));
wsq = Sqrt[2 (beta2 - 1)] - 1;
delta = 1/Sqrt[Log[wsq]];
(* In the text, alpha is Sqrt[2/(wsq - 1)],
but that appears to be a typo. alpha = Sqrt[1/(wsq - 1)]
gives values consistent with a standard normal. *)
alpha = Sqrt[1/(wsq - 1)];
delta*Log[y/alpha + Sqrt[(y/alpha)^2 + 1]]]

(* This defines the Anscombe Glynn standard normal approximation for sample
kurtosis given on page 388 of D`Agostino and Stephens. *)

In[3]:= agb2[data_, len_] := Block[{b2, eb2, sdb2, xx, sqrtbeta1b2, aa},
b2 = Kurtosis[data];
eb2 = 3 (len - 1)/(len + 1);
sdb2 = Sqrt[24 len (len - 2) (len - 3)/((len + 1)^2 (len + 3)
(len + 5))];
xx = (b2 - eb2)/sdb2;
sqrtbeta1b2 =
6 (len^2 - 5 len + 2)/((len + 7) (len + 9))*
Sqrt[6 (len + 3) (len + 5)/(len (len - 2) (len - 3))];
aa = 6 + 8/sqrtbeta1b2 (2/sqrtbeta1b2 + Sqrt[1 + 4/sqrtbeta1b2^2]);
(1 - 2/(9 aa) - ((1 - 2/aa)/(1 + xx*Sqrt[2/(aa - 4)]))^(1/3))/
Sqrt[2/(9 aa)]]

(* The D`Agostino and Pearson K^2 test statistic is the sum of squares of the
normal approximations to sample skewness and kurtosis. D`Agostino and Stephens
states (on page 391) that K^2 is approximately distributed
ChiSquareDistribution[2] and that the chi square approximation presents no
problems
for n >= 100. *)

In[4]:= DAgostinoPearsonKSquareTest[data_] :=
Block[{len = Length[data], ksqrstat},
ksqrstat = susqrtb1[data, len]^2 + agb2[data, len]^2;
{KSquareStatistic -> ksqrstat,
PValue -> 1 - CDF[ChiSquareDistribution[2], ksqrstat]}]

(* Normally distributed data should tend to have pvalues that are not too
small. *)

In[5]:= DAgostinoPearsonKSquareTest[RandomArray[NormalDistribution[3, 1],
1000]]

Out[5]= {KSquareStatistic -> 2.29967, PValue -> 0.316688}

(* Data that are far from normal will have small pvalues. *)

In[6]:= DAgostinoPearsonKSquareTest[Table[Random[Real, {1, 10}], {1000}]]

Out[6]= {KSquareStatistic -> 582.761, PValue -> 0.}

For the variance quoted on the TimeSeries page, I initially thought the
same thing you did. Assuming the signs are independent and there are equal
probabilities of getting positive and negative signs (and 0 probability of
getting a 0 difference), the statistic would follow
BinomialDistribution[n-1, 1/2], which would have a variance of
(n-1)/4.  Simulations give a variance that appears to be (n+1)/12 (which
would still indicate a typo in the TimeSeries documentation).  I haven't
figured out why this should be the variance yet.  My best guess is that the
assumption of independence is not valid given the differencing and as a
result the distribution is something other than BinomialDistribution[n-1, 1/2].

Darren Glosemeyer
Wolfram Research

At 05:15 AM 3/10/2006 -0500, john.hawkin at gmail.com wrote:
>Hello,
>
>I have two questions.
>
>1.  Are there any resources of .nb files available on the internet
>where I might find an implementation of the D'Agostino Pearson k^2 test
>for normal variates?
>
>2.  In the mathematica time series package (an add-on), the
>"difference-sign" test of residuals is mentioned (url:
>http://documents.wolfram.com/applications/timeseries/UsersGuidetoTimeSeries/1.6.2.html).
>  It says that the variance of this test is (n+1) / 2.  However, it
>would seem to me that a simple calculation gives a variance of (n-1)/4.
>  It goes as follows:
>
>If the series is differenced once, then the number of positive and
>negative values in the difference should be approximately equal.  If Xi
>denotes the sign of each value in the differenced series, then
>Mean(Xi) = 0.5(1) + 0.5(0) = 0.5
>Var(Xi) = Expectation( (Xi - Mean(Xi))^2 )
>= Expectation( Xi^2 -Xi + 0.25 )
>= 0.5 - 0.5 + 0.25
>= 0.25
>
>And assuming independence of each sign from the others, the total
>variance should be the sum of the individual variances, up to n-1 for n
>data points (since there are only n-1 changes in sign), thus
>
>Variance = (n-1) / 4
>
>There is an equivalent problem in Lemon's "Stochastic Physics" about
>coin flips, for which the answer is listed, without proof, as (n-1)/8.
>Because of these three conficting results I am wondering if I have made
>an error in my calculation, and if anyone can find one please let me
>know.
>
>Thank you very much,
>
>-John Hawkin

```

• Prev by Date: Re: Re: Mathematica and Education
• Next by Date: Re: 3D-plot over a triangle
• Previous by thread: Re: The D'Agostino Pearson k^2 test implemented in mathematica / variance of difference sign test