MathGroup Archive: March 2006 [00503]

[Date Index] [Thread Index] [Author Index]

Re: Re: variance of difference sign test

To: mathgroup at smc.vnet.net
Subject: [mg65209] Re: [mg65197] Re: variance of difference sign test
From: leigh pascoe <leigh at cephb.fr>
Date: Sat, 18 Mar 2006 06:40:31 -0500 (EST)
References: <duudcl$hug$1@smc.vnet.net> <200603171045.FAA15647@smc.vnet.net>
Sender: owner-wri-mathgroup at wolfram.com

ab_def at prontomail.com wrote:
> Suppose that we have n independent identically distributed random
> variables {u[1], ..., u[n]} and P[u[i] == u[j]] == 0 for i != j. We
> form another sequence {xi[1] = Boole[u[1] > u[2]], ..., xi[n - 1] =
> Boole[u[n - 1] > u[n]]} and we're looking for the variance of the sum
> of xi[i]:
>
> D[N[n]] == Variance[Sum[xi[i], {i, n - 1}]] ==
>
>   Variance[Sum[xi[i], {i, n - 2}] + xi[n - 1]] ==
>
>   Variance[Sum[xi[i], {i, n - 2}]] + Variance[xi[n - 1]] +
>
>     2*Covariance[Sum[xi[i], {i, n - 2}], xi[n - 1]] ==
>
>   D[N[n - 1]] + 1/4 + 2*Sum[Covariance[xi[i], xi[n - 1]], {i, n - 2}]
>
> For any pair of adjacent elements we have
>
> Covariance[xi[1], xi[2]] ==
>
>   P[xi[1] == 1 && xi[2] == 1] - P[xi[1] == 1]*P[xi[2] == 1] ==
>
>   P[u[1] > u[2] > u[3]] - P[u[1] > u[2]]*P[u[2] > u[3]] ==
>
>   1/6 - 1/4 == -1/12
>
> because all permutations of {u[1], ..., u[n]} are equally probable. For
> any non-adjacent elements Covariance[xi[i], xi[j]] == 0. Therefore,
>
> D[N[n]] == D[N[n - 1]] + 1/4 + 2*(-1/12), D[N[2]] = 1/4
>
> and D[N[n]] == (n + 1)/12 if n >= 2.
>
> Here is a check for n = 6:
>
> In[1]:= n = 6;
>
> Lvalfreq = {First@ #, Length@ #}& /@ Split@ Sort@
>   (Count[Sign[Most@ # - Rest@ #], 1]& /@
>     Permutations@ Range@ n)
>
> {Lval, Lp} = {Lvalfreq[[All, 1]], Lvalfreq[[All, 2]]/n!};
> mu = Lval.Lp
> sigma = ((Lval - mu)^2).Lp
>
> Out[2]= {{0, 1}, {1, 57}, {2, 302}, {3, 302}, {4, 57}, {5, 1}}
>
> Out[4]= 5/2
>
> Out[5]= 7/12
>
> And a numerical test:
>
> In[6]:= Lcnt = Array[
>   Count[Sign[Most@ # - Rest@ #]&@ Array[Random[]&, n], 1]&,
>   10^5];
>
> {Mean@ Lcnt, Variance@ Lcnt} - {mu, sigma} // N
>
> Out[7]= {0.00262, 0.0033856695}
>
> Maxim Rytin
> m.r at inbox.ru
>
> Darren Glosemeyer wrote:
>   
>> For the variance quoted on the TimeSeries page, I initially thought the
>> same thing you did. Assuming the signs are independent and there are equal
>> probabilities of getting positive and negative signs (and 0 probability of
>> getting a 0 difference), the statistic would follow
>> BinomialDistribution[n-1, 1/2], which would have a variance of
>> (n-1)/4.  Simulations give a variance that appears to be (n+1)/12 (which
>> would still indicate a typo in the TimeSeries documentation).  I haven't
>> figured out why this should be the variance yet.  My best guess is that the
>> assumption of independence is not valid given the differencing and as a
>> result the distribution is something other than BinomialDistribution[n-1, 1/2].
>>
>>
>> Darren Glosemeyer
>> Wolfram Research
>>
>>
>> At 05:15 AM 3/10/2006 -0500, john.hawkin at gmail.com wrote:
>>     
>>> Hello,
>>>
>>> I have two questions.
>>>
>>> 1.  Are there any resources of .nb files available on the internet
>>> where I might find an implementation of the D'Agostino Pearson k^2 test
>>> for normal variates?
>>>
>>> 2.  In the mathematica time series package (an add-on), the
>>> "difference-sign" test of residuals is mentioned (url:
>>> http://documents.wolfram.com/applications/timeseries/UsersGuidetoTimeSeries/1.6.2.html).
>>>  It says that the variance of this test is (n+1) / 2.  However, it
>>> would seem to me that a simple calculation gives a variance of (n-1)/4.
>>>  It goes as follows:
>>>
>>> If the series is differenced once, then the number of positive and
>>> negative values in the difference should be approximately equal.  If Xi
>>> denotes the sign of each value in the differenced series, then
>>> Mean(Xi) = 0.5(1) + 0.5(0) = 0.5
>>> Var(Xi) = Expectation( (Xi - Mean(Xi))^2 )
>>> = Expectation( Xi^2 -Xi + 0.25 )
>>> = 0.5 - 0.5 + 0.25
>>> = 0.25
>>>
>>> And assuming independence of each sign from the others, the total
>>> variance should be the sum of the individual variances, up to n-1 for n
>>> data points (since there are only n-1 changes in sign), thus
>>>
>>> Variance = (n-1) / 4
>>>
>>> There is an equivalent problem in Lemon's "Stochastic Physics" about
>>> coin flips, for which the answer is listed, without proof, as (n-1)/8.
>>> Because of these three conficting results I am wondering if I have made
>>> an error in my calculation, and if anyone can find one please let me
>>> know.
>>>
>>> Thank you very much,
>>>
>>> -John Hawkin
>>>       
>
>
>
>   
When you define the sign test in this way the adjacent terms are indeed 
not independent. A high residual value is more likely to be followed by 
one that is lower for example. However this is a strange way to do a 
sign test. Normally you would be interested in the deviation of the 
model from the observation i.e. the residual itself. In that case it is 
the sign of the residual that is of interest and this would be equally 
likely to be positive or negative under the null hypothesis that your 
time series model is correct. Thus if you define a value of one or zero 
according to the sign of the residual, you would have a series of 
independent and identically distributed binomial variables with p=0.5. 
The covariances of any two terms are by definition zero (iid variables). 
The mean and variance of the sum of the variables would be as calculated 
by John.

This seems to me to be a more appropriate way to test the fit of a model.

LP

References:
- Re: variance of difference sign test
  - From: ab_def@prontomail.com

Prev by Date: Re: A Reap Sow question

Next by Date: Re: Unevaluated values of a[[i]]+b[[j]]

Previous by thread: Re: variance of difference sign test

Next by thread: BinomialDistribution