Re: Warning -- another Random[] failure
- To: mathgroup at smc.vnet.net
- Subject: [mg42051] Re: [mg42014] Warning -- another Random[] failure
- From: Bobby Treat <drmajorbob at mailblocks.com>
- Date: Tue, 17 Jun 2003 05:43:46 -0400 (EDT)
- Sender: owner-wri-mathgroup at wolfram.com
This gets rid of most of the problem:
F[1, n2_] := Table[1, {n2 + 1}];
F[n1_, n2_] /; n1 > n2 := F[n2, n1]
F[n1_, n2_] /; n1 <= n2 := F[n1, n2] = Join[Table[0, {n2}], F[
n1 - 1, n2]] + Join[F[n1, n2 - 1], Table[0, {n1}]]
n1 = n2 = 14; reps = 2*10^5;
f = Table[0, {n1*n2 + 1}]; L = 1 - n1(n1 + 1)/2;
myRandom := ((Random[Integer, 2^30 - 1]/2^30.) +
Random[Integer, 2^30 - 1])/2^30.
Do[i = First[Plus @@ Position[
Last[Transpose[Sort[Join[Table[{myRandom, 0}, {n1}],
Table[{myRandom, 1}, {n2}]]]]], 0]] + L;
f[[i]]++, {reps}]
Bobby
-----Original Message-----
From: Ray Koopman <koopman at sfu.ca>
To: mathgroup at smc.vnet.net
Subject: [mg42051] [mg42014] Warning -- another Random[] failure
SUMMARY In the course of some Monte Carlo work with the Mann-Whitney U
statistic, I have encountered another failure of Random[]: the
empirical sampling distribution of U in the continuous null case
differs systematically from its theoretical distribution whenever the
sample sizes are large enough, where "large enough" depends
on how many calls to Random[] each sample value needs. If each value
uses only a single Random[] then the empirical distribution of U will
be bad if n1 + n2 > 24; if each value needs two calls to Random[] --
e.g., if we use Random[]-Random[] to sample from a symmetric triangular
distribution on (-1,1) -- then the 24 becomes 12; etc. DETAILS The
following code gets the exact theoretical frequency distribution of U,
normalized to sum to Multinomial[n1,n2]. It assumes n1 <= n2.
F[1,n2_] := Table[1,{n2+1}]; F[n1_,n2_] := F[n1,n2] =
Join[Table[0,{n2}],F[n1-1,n2]] +
Join[F[Min[n1,n2-1],Max[n1,n2-1]],Table[0,{n1}]] The following code
creates an empirical frequency distribution of U for "reps"
pairs of samples, of sizes n1 and n2, from a Uniform(0,1) population.
Note that it first gets Wilcoxon's rank-sum statistic, then converts to
U+1 by adding L. f = Table[0,{n1*n2+1}]; L = 1 - n1(n1+1)/2; Do[i =
First[Plus@@Position[Last[Transpose[Sort[Join[
Table[{Random[],0},{n1}], Table[{Random[],1},{n2}]]]]],0]] + L;
f[[i]]++, {reps}] The expected frequencies are e =
F[n1,n2]*reps/Plus@@F[n1,n2]; the Pearson residuals and chi-square (df
= n1*n2) are z = (f-e)/Sqrt[e]; chisqr = N[z.z] To see the problem,
ListPlot[Transpose[{Range[0,n1*n2],z}], PlotRange->All,
Frame->True] The n1*n2+1 residuals in z should approximate
independent standard normal deviates, and the display should look
"random", with no structure. But consider the following
empirical distribution, obtained from the above code with n1 = n2 = 14,
reps = 2*10^6: f =
{0,0,0,0,1,0,1,0,3,2,6,7,3,3,3,7,14,11,19,17,16,26,45,
47,57,85,92,127,133,142,203,207,270,338,375,416,492,568,
688,787,924,1065,1203,1391,1528,1709,2040,2199,2477,2738,
3152,3502,3825,4276,4787,5326,5830,6312,6922,7599,8168,
8752,9532,10181,11097,11537,12616,13406,14399,15207,16291,
17092,18196,19125,20027,21398,22428,23380,24112,25098,
26199,27319,28448,29375,29922,30638,31633,32225,32758,
33652,34180,34892,35479,35450,35968,35942,36389,36431,
36628,36449,36431,35824,35863,35534,35247,34390,34110,
33735,33158,32069,31358,30521,29859,28884,28133,27173,
26164,25055,24438,23256,22263,21075,20180,19540,18248, 17170,
16064,15327,14473,13442,12662,11607,11121,10230,
9441,8809,8091,7415,6848,6193,5669,5187,4660,4213,3732,
3408,3102,2830,2484,2196,1992,1763,1536,1367,1232,1029,
883,821,682,603,507,443,386,328,269,202,186,155,130,96,
99,69,69,56,34,31,26,25,15,11,15,4,5,3,5,2,2,1,1,0,0,0, 0,0,0,0,0};
This f gives chisqr = 653.677, which is far too big, even after
allowing for expecteds that are less than 5. More importantly, the
residual plot is strongly patterned: it looks like a "w", and
this shape seems to characterize these plots whenever the sample sizes
are as noted above and the number of reps is large enough to let the
effect be seen.