Re: Kolmogorov-Smirnov 2-sample test
- To: mathgroup at smc.vnet.net
- Subject: [mg111227] Re: Kolmogorov-Smirnov 2-sample test
- From: Ray Koopman <koopman at sfu.ca>
- Date: Sat, 24 Jul 2010 05:07:13 -0400 (EDT)
- References: <i2bt8f$b8g$1@smc.vnet.net>
On Jul 23, 4:09 am, Andy Ross <an... at wolfram.com> wrote:
>
> I've corrected the error in my code from before. The p-value
> computation was giving low estimates because I was using RandomChoice
> rather than RandomSample. I believe this should do the job (though
> rather slowly).
>
> empiricalCDF[data_, x_] := Length[Select[data, # <= x &]]/Length[data]
>
> splitAtN1[udat_, n1_] := {udat[[1 ;; n1]], udat[[n1 + 1 ;; -1]]}
>
> KolmogorovSmirnov2Sample[data1_, data2_] :=
> Block[{sd1 = Sort[data1], sd2 = Sort[data2], e1, e2,
> udat = Union[Flatten[{data1, data2}]], n1 = Length[data1],
> n2 = Length[data2], T},
> e1 = empiricalCDF[sd1, #] & /@ udat;
> e2 = empiricalCDF[sd2, #] & /@ udat;
> T = Max[Abs[e1 - e2]];
> (Sqrt[(n1*n2)/(n1 + n2)]) T // N
> ]
>
> KS2BootStrapPValue[data1_, data2_, T_, MCSamp_] :=
> Block[{n1 = Length[data1], udat = Join[data1, data2], dfts},
> dfts = ConstantArray[0, MCSamp];
> Do[
> dfts[[i]] =
> KolmogorovSmirnov2Sample @@ splitAtN1[RandomSample[udat], n1]
> , {i, MCSamp}
> ];
> Length[Select[dfts, # >= T &]]/MCSamp // N
> ]
>
> Example:
>
> data1 = {0.386653, 1.10925, 0.871822, -0.266199, 2.00516, -1.48574,
> -0.68592, -0.0461418, -0.29906, 0.209381};
>
> data2 = {-0.283594, -1.08097, 0.915052, 0.448915, -0.88062, -0.140511,
> -0.0812646, -1.1592, 0.138245, -0.314907};
>
> In[41]:= KolmogorovSmirnov2Sample[data1, data2]
>
> Out[41]= 0.67082
>
> Using 1000 bootstrap samples...
>
> In[42]:= KS2BootStrapPValue[data1, data2, .67082, 1000]
>
> Out[42]= 0.791
>
> -Andy
The ugly procedural code in ks2 returns the exact p-value very
quickly:
AbsoluteTiming@ks2[data1,data2]
{0.005842 Second,{30,0.78693}}