MathGroup Archive 2003

[Date Index] [Thread Index] [Author Index]

Search the Archive

Warning -- another Random[] failure

  • To: mathgroup at smc.vnet.net
  • Subject: [mg42014] Warning -- another Random[] failure
  • From: koopman at sfu.ca (Ray Koopman)
  • Date: Mon, 16 Jun 2003 03:58:00 -0400 (EDT)
  • Sender: owner-wri-mathgroup at wolfram.com

SUMMARY

In the course of some Monte Carlo work with the Mann-Whitney 
U statistic, I have encountered another failure of Random[]: 
the empirical sampling distribution of U in the continuous null 
case differs systematically from its theoretical distribution 
whenever the sample sizes are large enough, where "large enough" 
depends on how many calls to Random[] each sample value needs. 
If each value uses only a single Random[] then the empirical 
distribution of U will be bad if n1 + n2 > 24; if each value 
needs two calls to Random[] -- e.g., if we use Random[]-Random[] 
to sample from a symmetric triangular distribution on (-1,1) -- 
then the 24 becomes 12; etc.

DETAILS

The following code gets the exact theoretical frequency 
distribution of U, normalized to sum to Multinomial[n1,n2]. 
It assumes n1 <= n2.

  F[1,n2_] := Table[1,{n2+1}]; 
  F[n1_,n2_] := F[n1,n2] = Join[Table[0,{n2}],F[n1-1,n2]] + 
    Join[F[Min[n1,n2-1],Max[n1,n2-1]],Table[0,{n1}]]

The following code creates an empirical frequency distribution 
of U for "reps" pairs of samples, of sizes n1 and n2, from a 
Uniform(0,1) population. Note that it first gets Wilcoxon's 
rank-sum statistic, then converts to U+1 by adding L.

  f = Table[0,{n1*n2+1}]; L = 1 - n1(n1+1)/2;
  Do[i = First[Plus@@Position[Last[Transpose[Sort[Join[
      Table[{Random[],0},{n1}],
      Table[{Random[],1},{n2}]]]]],0]] + L;
    f[[i]]++,
  {reps}]

The expected frequencies are

  e = F[n1,n2]*reps/Plus@@F[n1,n2]; 

the Pearson residuals and chi-square (df = n1*n2) are

  z = (f-e)/Sqrt[e]; chisqr = N[z.z]

To see the problem,

  ListPlot[Transpose[{Range[0,n1*n2],z}],
    PlotRange->All, Frame->True]

The n1*n2+1 residuals in z should approximate independent 
standard normal deviates, and the display should look "random", 
with no structure.

But consider the following empirical distribution, obtained 
from the above code with n1 = n2 = 14, reps = 2*10^6:

f = {0,0,0,0,1,0,1,0,3,2,6,7,3,3,3,7,14,11,19,17,16,26,45,
 47,57,85,92,127,133,142,203,207,270,338,375,416,492,568,
 688,787,924,1065,1203,1391,1528,1709,2040,2199,2477,2738,
 3152,3502,3825,4276,4787,5326,5830,6312,6922,7599,8168,
 8752,9532,10181,11097,11537,12616,13406,14399,15207,16291,
 17092,18196,19125,20027,21398,22428,23380,24112,25098,
 26199,27319,28448,29375,29922,30638,31633,32225,32758,
 33652,34180,34892,35479,35450,35968,35942,36389,36431,
 36628,36449,36431,35824,35863,35534,35247,34390,34110,
 33735,33158,32069,31358,30521,29859,28884,28133,27173,
 26164,25055,24438,23256,22263,21075,20180,19540,18248,
 17170, 16064,15327,14473,13442,12662,11607,11121,10230,
 9441,8809,8091,7415,6848,6193,5669,5187,4660,4213,3732,
 3408,3102,2830,2484,2196,1992,1763,1536,1367,1232,1029,
 883,821,682,603,507,443,386,328,269,202,186,155,130,96,
 99,69,69,56,34,31,26,25,15,11,15,4,5,3,5,2,2,1,1,0,0,0,
 0,0,0,0,0};

This f gives chisqr = 653.677, which is far too big, 
even after allowing for expecteds that are less than 5. 
More importantly, the residual plot is strongly patterned: 
it looks like a "w", and this shape seems to characterize 
these plots whenever the sample sizes are as noted above and 
the number of reps is large enough to let the effect be seen.


  • Prev by Date: Re: InverseFunction[]
  • Next by Date: Re: Re: O(n^2) complexity of ToCycles in Combinatorica
  • Previous by thread: Re: Transfering Packages from Windows to Mac
  • Next by thread: Re: Warning -- another Random[] failure