MathGroup Archive: June 2003 [00359]

[Date Index] [Thread Index] [Author Index]

Re: Warning -- another Random[] failure

To: mathgroup at smc.vnet.net
Subject: [mg42051] Re: [mg42014] Warning -- another Random[] failure
From: Bobby Treat <drmajorbob at mailblocks.com>
Date: Tue, 17 Jun 2003 05:43:46 -0400 (EDT)
Sender: owner-wri-mathgroup at wolfram.com

This gets rid of most of the problem:

F[1, n2_] := Table[1, {n2 + 1}];
F[n1_, n2_] /; n1 > n2 := F[n2, n1]
F[n1_, n2_] /; n1 <= n2 := F[n1, n2] = Join[Table[0, {n2}], F[
  n1 - 1, n2]] + Join[F[n1, n2 - 1], Table[0, {n1}]]
n1 = n2 = 14; reps = 2*10^5;
f = Table[0, {n1*n2 + 1}]; L = 1 - n1(n1 + 1)/2;

myRandom := ((Random[Integer, 2^30 - 1]/2^30.) +
 Random[Integer, 2^30 - 1])/2^30.
Do[i = First[Plus @@ Position[
          Last[Transpose[Sort[Join[Table[{myRandom, 0}, {n1}],
        Table[{myRandom, 1}, {n2}]]]]], 0]] + L;
  f[[i]]++, {reps}]

Bobby

-----Original Message-----
From: Ray Koopman <koopman at sfu.ca>
To: mathgroup at smc.vnet.net
Subject: [mg42051] [mg42014] Warning -- another Random[] failure

SUMMARY In the course of some Monte Carlo work with the Mann-Whitney U 
statistic, I have encountered another failure of Random[]: the 
empirical sampling distribution of U in the continuous null case 
differs systematically from its theoretical distribution whenever the 
sample sizes are large enough, where &quot;large enough&quot; depends 
on how many calls to Random[] each sample value needs. If each value 
uses only a single Random[] then the empirical distribution of U will 
be bad if n1 + n2 &gt; 24; if each value needs two calls to Random[] -- 
e.g., if we use Random[]-Random[] to sample from a symmetric triangular 
distribution on (-1,1) -- then the 24 becomes 12; etc. DETAILS The 
following code gets the exact theoretical frequency distribution of U, 
normalized to sum to Multinomial[n1,n2]. It assumes n1 &lt;= n2. 
F[1,n2_] := Table[1,{n2+1}]; F[n1_,n2_] := F[n1,n2] = 
Join[Table[0,{n2}],F[n1-1,n2]] + 
Join[F[Min[n1,n2-1],Max[n1,n2-1]],Table[0,{n1}]] The following code 
creates an empirical frequency distribution of U for &quot;reps&quot; 
pairs of samples, of sizes n1 and n2, from a Uniform(0,1) population. 
Note that it first gets Wilcoxon's rank-sum statistic, then converts to 
U+1 by adding L. f = Table[0,{n1*n2+1}]; L = 1 - n1(n1+1)/2; Do[i = 
First[Plus@@Position[Last[Transpose[Sort[Join[ 
Table[{Random[],0},{n1}], Table[{Random[],1},{n2}]]]]],0]] + L; 
f[[i]]++, {reps}] The expected frequencies are e = 
F[n1,n2]*reps/Plus@@F[n1,n2]; the Pearson residuals and chi-square (df 
= n1*n2) are z = (f-e)/Sqrt[e]; chisqr = N[z.z] To see the problem, 
ListPlot[Transpose[{Range[0,n1*n2],z}], PlotRange-&gt;All, 
Frame-&gt;True] The n1*n2+1 residuals in z should approximate 
independent standard normal deviates, and the display should look 
&quot;random&quot;, with no structure. But consider the following 
empirical distribution, obtained from the above code with n1 = n2 = 14, 
reps = 2*10^6: f = 
{0,0,0,0,1,0,1,0,3,2,6,7,3,3,3,7,14,11,19,17,16,26,45, 
47,57,85,92,127,133,142,203,207,270,338,375,416,492,568, 
688,787,924,1065,1203,1391,1528,1709,2040,2199,2477,2738, 
3152,3502,3825,4276,4787,5326,5830,6312,6922,7599,8168, 
8752,9532,10181,11097,11537,12616,13406,14399,15207,16291, 
17092,18196,19125,20027,21398,22428,23380,24112,25098, 
26199,27319,28448,29375,29922,30638,31633,32225,32758, 
33652,34180,34892,35479,35450,35968,35942,36389,36431, 
36628,36449,36431,35824,35863,35534,35247,34390,34110, 
33735,33158,32069,31358,30521,29859,28884,28133,27173, 
26164,25055,24438,23256,22263,21075,20180,19540,18248, 17170, 
16064,15327,14473,13442,12662,11607,11121,10230, 
9441,8809,8091,7415,6848,6193,5669,5187,4660,4213,3732, 
3408,3102,2830,2484,2196,1992,1763,1536,1367,1232,1029, 
883,821,682,603,507,443,386,328,269,202,186,155,130,96, 
99,69,69,56,34,31,26,25,15,11,15,4,5,3,5,2,2,1,1,0,0,0, 0,0,0,0,0}; 
This f gives chisqr = 653.677, which is far too big, even after 
allowing for expecteds that are less than 5. More importantly, the 
residual plot is strongly patterned: it looks like a &quot;w&quot;, and 
this shape seems to characterize these plots whenever the sample sizes 
are as noted above and the number of reps is large enough to let the 
effect be seen.

Prev by Date: Re: Printing Cellular Automata

Next by Date: Total Derivative

Previous by thread: Warning -- another Random[] failure

Next by thread: Not plotting section of outfitting graphics