MathGroup Archive: March 2006 [00303]

[Date Index] [Thread Index] [Author Index]

Re: The D'Agostino Pearson k^2 test implemented in mathematica / variance of difference sign test

To: mathgroup at smc.vnet.net
Subject: [mg65018] Re: [mg64990] The D'Agostino Pearson k^2 test implemented in mathematica / variance of difference sign test
From: Darren Glosemeyer <darreng at wolfram.com>
Date: Sat, 11 Mar 2006 05:16:06 -0500 (EST)
Sender: owner-wri-mathgroup at wolfram.com

Here is an implementation of the D`Agostino Pearson K^2 test based on the 
discussion in
R. B. D'Agostino and M. A. Stephens, editors. Goodness-of-Fit Techniques. 
Marcel Dekker, Inc., 1986.

In[1]:= << Statistics`

(* This defines the SU standard normal approximation for sample skewness
given on page 377 of D`Agostino and Stephens. *)

In[2]:= susqrtb1[data_, len_] := Block[{y, beta2, wsq, delta, alpha},
           y = Skewness[data] Sqrt[(len + 1) (len + 3)/(6 (len - 2))];
           beta2 = 3 (len^2 + 27 len - 70) (len +
               1) (len + 3)/((len - 2) (len + 5) (len + 7) (len + 9));
           wsq = Sqrt[2 (beta2 - 1)] - 1;
           delta = 1/Sqrt[Log[wsq]];
           (* In the text, alpha is Sqrt[2/(wsq - 1)],
           but that appears to be a typo. alpha = Sqrt[1/(wsq - 1)]
             gives values consistent with a standard normal. *)
           alpha = Sqrt[1/(wsq - 1)];
           delta*Log[y/alpha + Sqrt[(y/alpha)^2 + 1]]]

(* This defines the Anscombe Glynn standard normal approximation for sample
kurtosis given on page 388 of D`Agostino and Stephens. *)

In[3]:= agb2[data_, len_] := Block[{b2, eb2, sdb2, xx, sqrtbeta1b2, aa},
           b2 = Kurtosis[data];
           eb2 = 3 (len - 1)/(len + 1);
           sdb2 = Sqrt[24 len (len - 2) (len - 3)/((len + 1)^2 (len + 3) 
(len + 5))];
           xx = (b2 - eb2)/sdb2;
           sqrtbeta1b2 =
            6 (len^2 - 5 len + 2)/((len + 7) (len + 9))*
             Sqrt[6 (len + 3) (len + 5)/(len (len - 2) (len - 3))];
           aa = 6 + 8/sqrtbeta1b2 (2/sqrtbeta1b2 + Sqrt[1 + 4/sqrtbeta1b2^2]);
           (1 - 2/(9 aa) - ((1 - 2/aa)/(1 + xx*Sqrt[2/(aa - 4)]))^(1/3))/
            Sqrt[2/(9 aa)]]

(* The D`Agostino and Pearson K^2 test statistic is the sum of squares of the
normal approximations to sample skewness and kurtosis. D`Agostino and Stephens
states (on page 391) that K^2 is approximately distributed
ChiSquareDistribution[2] and that the chi square approximation presents no 
problems
for n >= 100. *)

In[4]:= DAgostinoPearsonKSquareTest[data_] :=
          Block[{len = Length[data], ksqrstat},
           ksqrstat = susqrtb1[data, len]^2 + agb2[data, len]^2;
           {KSquareStatistic -> ksqrstat,
            PValue -> 1 - CDF[ChiSquareDistribution[2], ksqrstat]}]

(* Normally distributed data should tend to have pvalues that are not too 
small. *)

In[5]:= DAgostinoPearsonKSquareTest[RandomArray[NormalDistribution[3, 1], 
1000]]

Out[5]= {KSquareStatistic -> 2.29967, PValue -> 0.316688}

(* Data that are far from normal will have small pvalues. *)

In[6]:= DAgostinoPearsonKSquareTest[Table[Random[Real, {1, 10}], {1000}]]

Out[6]= {KSquareStatistic -> 582.761, PValue -> 0.}

For the variance quoted on the TimeSeries page, I initially thought the 
same thing you did. Assuming the signs are independent and there are equal 
probabilities of getting positive and negative signs (and 0 probability of 
getting a 0 difference), the statistic would follow 
BinomialDistribution[n-1, 1/2], which would have a variance of 
(n-1)/4.  Simulations give a variance that appears to be (n+1)/12 (which 
would still indicate a typo in the TimeSeries documentation).  I haven't 
figured out why this should be the variance yet.  My best guess is that the 
assumption of independence is not valid given the differencing and as a 
result the distribution is something other than BinomialDistribution[n-1, 1/2].

Darren Glosemeyer
Wolfram Research

At 05:15 AM 3/10/2006 -0500, john.hawkin at gmail.com wrote:
>Hello,
>
>I have two questions.
>
>1.  Are there any resources of .nb files available on the internet
>where I might find an implementation of the D'Agostino Pearson k^2 test
>for normal variates?
>
>2.  In the mathematica time series package (an add-on), the
>"difference-sign" test of residuals is mentioned (url:
>http://documents.wolfram.com/applications/timeseries/UsersGuidetoTimeSeries/1.6.2.html).
>  It says that the variance of this test is (n+1) / 2.  However, it
>would seem to me that a simple calculation gives a variance of (n-1)/4.
>  It goes as follows:
>
>If the series is differenced once, then the number of positive and
>negative values in the difference should be approximately equal.  If Xi
>denotes the sign of each value in the differenced series, then
>Mean(Xi) = 0.5(1) + 0.5(0) = 0.5
>Var(Xi) = Expectation( (Xi - Mean(Xi))^2 )
>= Expectation( Xi^2 -Xi + 0.25 )
>= 0.5 - 0.5 + 0.25
>= 0.25
>
>And assuming independence of each sign from the others, the total
>variance should be the sum of the individual variances, up to n-1 for n
>data points (since there are only n-1 changes in sign), thus
>
>Variance = (n-1) / 4
>
>There is an equivalent problem in Lemon's "Stochastic Physics" about
>coin flips, for which the answer is listed, without proof, as (n-1)/8.
>Because of these three conficting results I am wondering if I have made
>an error in my calculation, and if anyone can find one please let me
>know.
>
>Thank you very much,
>
>-John Hawkin

Prev by Date: Re: Re: Mathematica and Education

Next by Date: Re: 3D-plot over a triangle

Previous by thread: Re: The D'Agostino Pearson k^2 test implemented in mathematica / variance of difference sign test

Next by thread: TriangularSurfacePlot for quadrangles ??