MathGroup Archive 2003

[Date Index] [Thread Index] [Author Index]

Search the Archive

Re: Warning -- another Random[] failure

  • To: mathgroup at smc.vnet.net
  • Subject: [mg42051] Re: [mg42014] Warning -- another Random[] failure
  • From: Bobby Treat <drmajorbob at mailblocks.com>
  • Date: Tue, 17 Jun 2003 05:43:46 -0400 (EDT)
  • Sender: owner-wri-mathgroup at wolfram.com

This gets rid of most of the problem:

F[1, n2_] := Table[1, {n2 + 1}];
F[n1_, n2_] /; n1 > n2 := F[n2, n1]
F[n1_, n2_] /; n1 <= n2 := F[n1, n2] = Join[Table[0, {n2}], F[
  n1 - 1, n2]] + Join[F[n1, n2 - 1], Table[0, {n1}]]
n1 = n2 = 14; reps = 2*10^5;
f = Table[0, {n1*n2 + 1}]; L = 1 - n1(n1 + 1)/2;

myRandom := ((Random[Integer, 2^30 - 1]/2^30.) +
 Random[Integer, 2^30 - 1])/2^30.
Do[i = First[Plus @@ Position[
          Last[Transpose[Sort[Join[Table[{myRandom, 0}, {n1}],
        Table[{myRandom, 1}, {n2}]]]]], 0]] + L;
  f[[i]]++, {reps}]

Bobby

-----Original Message-----
From: Ray Koopman <koopman at sfu.ca>
To: mathgroup at smc.vnet.net
Subject: [mg42051] [mg42014] Warning -- another Random[] failure

SUMMARY In the course of some Monte Carlo work with the Mann-Whitney U 
statistic, I have encountered another failure of Random[]: the 
empirical sampling distribution of U in the continuous null case 
differs systematically from its theoretical distribution whenever the 
sample sizes are large enough, where &quot;large enough&quot; depends 
on how many calls to Random[] each sample value needs. If each value 
uses only a single Random[] then the empirical distribution of U will 
be bad if n1 + n2 &gt; 24; if each value needs two calls to Random[] -- 
e.g., if we use Random[]-Random[] to sample from a symmetric triangular 
distribution on (-1,1) -- then the 24 becomes 12; etc. DETAILS The 
following code gets the exact theoretical frequency distribution of U, 
normalized to sum to Multinomial[n1,n2]. It assumes n1 &lt;= n2. 
F[1,n2_] := Table[1,{n2+1}]; F[n1_,n2_] := F[n1,n2] = 
Join[Table[0,{n2}],F[n1-1,n2]] + 
Join[F[Min[n1,n2-1],Max[n1,n2-1]],Table[0,{n1}]] The following code 
creates an empirical frequency distribution of U for &quot;reps&quot; 
pairs of samples, of sizes n1 and n2, from a Uniform(0,1) population. 
Note that it first gets Wilcoxon's rank-sum statistic, then converts to 
U+1 by adding L. f = Table[0,{n1*n2+1}]; L = 1 - n1(n1+1)/2; Do[i = 
First[Plus@@Position[Last[Transpose[Sort[Join[ 
Table[{Random[],0},{n1}], Table[{Random[],1},{n2}]]]]],0]] + L; 
f[[i]]++, {reps}] The expected frequencies are e = 
F[n1,n2]*reps/Plus@@F[n1,n2]; the Pearson residuals and chi-square (df 
= n1*n2) are z = (f-e)/Sqrt[e]; chisqr = N[z.z] To see the problem, 
ListPlot[Transpose[{Range[0,n1*n2],z}], PlotRange-&gt;All, 
Frame-&gt;True] The n1*n2+1 residuals in z should approximate 
independent standard normal deviates, and the display should look 
&quot;random&quot;, with no structure. But consider the following 
empirical distribution, obtained from the above code with n1 = n2 = 14, 
reps = 2*10^6: f = 
{0,0,0,0,1,0,1,0,3,2,6,7,3,3,3,7,14,11,19,17,16,26,45, 
47,57,85,92,127,133,142,203,207,270,338,375,416,492,568, 
688,787,924,1065,1203,1391,1528,1709,2040,2199,2477,2738, 
3152,3502,3825,4276,4787,5326,5830,6312,6922,7599,8168, 
8752,9532,10181,11097,11537,12616,13406,14399,15207,16291, 
17092,18196,19125,20027,21398,22428,23380,24112,25098, 
26199,27319,28448,29375,29922,30638,31633,32225,32758, 
33652,34180,34892,35479,35450,35968,35942,36389,36431, 
36628,36449,36431,35824,35863,35534,35247,34390,34110, 
33735,33158,32069,31358,30521,29859,28884,28133,27173, 
26164,25055,24438,23256,22263,21075,20180,19540,18248, 17170, 
16064,15327,14473,13442,12662,11607,11121,10230, 
9441,8809,8091,7415,6848,6193,5669,5187,4660,4213,3732, 
3408,3102,2830,2484,2196,1992,1763,1536,1367,1232,1029, 
883,821,682,603,507,443,386,328,269,202,186,155,130,96, 
99,69,69,56,34,31,26,25,15,11,15,4,5,3,5,2,2,1,1,0,0,0, 0,0,0,0,0}; 
This f gives chisqr = 653.677, which is far too big, even after 
allowing for expecteds that are less than 5. More importantly, the 
residual plot is strongly patterned: it looks like a &quot;w&quot;, and 
this shape seems to characterize these plots whenever the sample sizes 
are as noted above and the number of reps is large enough to let the 
effect be seen. 


  • Prev by Date: Re: Printing Cellular Automata
  • Next by Date: Total Derivative
  • Previous by thread: Warning -- another Random[] failure
  • Next by thread: Not plotting section of outfitting graphics