MathGroup Archive: June 2003 [00311]

[Date Index] [Thread Index] [Author Index]

Warning -- another Random[] failure

To: mathgroup at smc.vnet.net
Subject: [mg42014] Warning -- another Random[] failure
From: koopman at sfu.ca (Ray Koopman)
Date: Mon, 16 Jun 2003 03:58:00 -0400 (EDT)
Sender: owner-wri-mathgroup at wolfram.com

SUMMARY

In the course of some Monte Carlo work with the Mann-Whitney 
U statistic, I have encountered another failure of Random[]: 
the empirical sampling distribution of U in the continuous null 
case differs systematically from its theoretical distribution 
whenever the sample sizes are large enough, where "large enough" 
depends on how many calls to Random[] each sample value needs. 
If each value uses only a single Random[] then the empirical 
distribution of U will be bad if n1 + n2 > 24; if each value 
needs two calls to Random[] -- e.g., if we use Random[]-Random[] 
to sample from a symmetric triangular distribution on (-1,1) -- 
then the 24 becomes 12; etc.

DETAILS

The following code gets the exact theoretical frequency 
distribution of U, normalized to sum to Multinomial[n1,n2]. 
It assumes n1 <= n2.

  F[1,n2_] := Table[1,{n2+1}]; 
  F[n1_,n2_] := F[n1,n2] = Join[Table[0,{n2}],F[n1-1,n2]] + 
    Join[F[Min[n1,n2-1],Max[n1,n2-1]],Table[0,{n1}]]

The following code creates an empirical frequency distribution 
of U for "reps" pairs of samples, of sizes n1 and n2, from a 
Uniform(0,1) population. Note that it first gets Wilcoxon's 
rank-sum statistic, then converts to U+1 by adding L.

  f = Table[0,{n1*n2+1}]; L = 1 - n1(n1+1)/2;
  Do[i = First[Plus@@Position[Last[Transpose[Sort[Join[
      Table[{Random[],0},{n1}],
      Table[{Random[],1},{n2}]]]]],0]] + L;
    f[[i]]++,
  {reps}]

The expected frequencies are

  e = F[n1,n2]*reps/Plus@@F[n1,n2]; 

the Pearson residuals and chi-square (df = n1*n2) are

  z = (f-e)/Sqrt[e]; chisqr = N[z.z]

To see the problem,

  ListPlot[Transpose[{Range[0,n1*n2],z}],
    PlotRange->All, Frame->True]

The n1*n2+1 residuals in z should approximate independent 
standard normal deviates, and the display should look "random", 
with no structure.

But consider the following empirical distribution, obtained 
from the above code with n1 = n2 = 14, reps = 2*10^6:

f = {0,0,0,0,1,0,1,0,3,2,6,7,3,3,3,7,14,11,19,17,16,26,45,
 47,57,85,92,127,133,142,203,207,270,338,375,416,492,568,
 688,787,924,1065,1203,1391,1528,1709,2040,2199,2477,2738,
 3152,3502,3825,4276,4787,5326,5830,6312,6922,7599,8168,
 8752,9532,10181,11097,11537,12616,13406,14399,15207,16291,
 17092,18196,19125,20027,21398,22428,23380,24112,25098,
 26199,27319,28448,29375,29922,30638,31633,32225,32758,
 33652,34180,34892,35479,35450,35968,35942,36389,36431,
 36628,36449,36431,35824,35863,35534,35247,34390,34110,
 33735,33158,32069,31358,30521,29859,28884,28133,27173,
 26164,25055,24438,23256,22263,21075,20180,19540,18248,
 17170, 16064,15327,14473,13442,12662,11607,11121,10230,
 9441,8809,8091,7415,6848,6193,5669,5187,4660,4213,3732,
 3408,3102,2830,2484,2196,1992,1763,1536,1367,1232,1029,
 883,821,682,603,507,443,386,328,269,202,186,155,130,96,
 99,69,69,56,34,31,26,25,15,11,15,4,5,3,5,2,2,1,1,0,0,0,
 0,0,0,0,0};

This f gives chisqr = 653.677, which is far too big, 
even after allowing for expecteds that are less than 5. 
More importantly, the residual plot is strongly patterned: 
it looks like a "w", and this shape seems to characterize 
these plots whenever the sample sizes are as noted above and 
the number of reps is large enough to let the effect be seen.

Prev by Date: Re: InverseFunction[]

Next by Date: Re: Re: O(n^2) complexity of ToCycles in Combinatorica

Previous by thread: Re: Transfering Packages from Windows to Mac

Next by thread: Re: Warning -- another Random[] failure