MathGroup Archive 2010

[Date Index] [Thread Index] [Author Index]

Search the Archive

Re: Kolmogorov-Smirnov 2-sample test

  • To: mathgroup at smc.vnet.net
  • Subject: [mg111227] Re: Kolmogorov-Smirnov 2-sample test
  • From: Ray Koopman <koopman at sfu.ca>
  • Date: Sat, 24 Jul 2010 05:07:13 -0400 (EDT)
  • References: <i2bt8f$b8g$1@smc.vnet.net>

On Jul 23, 4:09 am, Andy Ross <an... at wolfram.com> wrote:
>
> I've corrected the error in my code from before.  The p-value
> computation was giving low estimates because I was using RandomChoice
> rather than RandomSample. I believe this should do the job (though
> rather slowly).
>
> empiricalCDF[data_, x_] := Length[Select[data, # <= x &]]/Length[data]
>
> splitAtN1[udat_, n1_] := {udat[[1 ;; n1]], udat[[n1 + 1 ;; -1]]}
>
> KolmogorovSmirnov2Sample[data1_, data2_] :=
>   Block[{sd1 = Sort[data1], sd2 = Sort[data2], e1, e2,
>     udat = Union[Flatten[{data1, data2}]], n1 = Length[data1],
>     n2 = Length[data2], T},
>    e1 = empiricalCDF[sd1, #] & /@ udat;
>    e2 = empiricalCDF[sd2, #] & /@ udat;
>    T = Max[Abs[e1 - e2]];
>    (Sqrt[(n1*n2)/(n1 + n2)]) T // N
>    ]
>
> KS2BootStrapPValue[data1_, data2_, T_, MCSamp_] :=
>   Block[{n1 = Length[data1], udat = Join[data1, data2], dfts},
>    dfts = ConstantArray[0, MCSamp];
>    Do[
>     dfts[[i]] =
>      KolmogorovSmirnov2Sample @@ splitAtN1[RandomSample[udat], n1]
>     , {i, MCSamp}
>     ];
>    Length[Select[dfts, # >= T &]]/MCSamp // N
>    ]
>
> Example:
>
> data1 = {0.386653, 1.10925, 0.871822, -0.266199, 2.00516, -1.48574,
> -0.68592, -0.0461418, -0.29906, 0.209381};
>
> data2 = {-0.283594, -1.08097, 0.915052, 0.448915, -0.88062, -0.140511,
> -0.0812646, -1.1592, 0.138245, -0.314907};
>
> In[41]:= KolmogorovSmirnov2Sample[data1, data2]
>
> Out[41]= 0.67082
>
> Using 1000 bootstrap samples...
>
> In[42]:= KS2BootStrapPValue[data1, data2, .67082, 1000]
>
> Out[42]= 0.791
>
> -Andy

The ugly procedural code in ks2 returns the exact p-value very
quickly:

AbsoluteTiming@ks2[data1,data2]

{0.005842 Second,{30,0.78693}}


  • Prev by Date: Re: surprising comparison of Mathematica 5.2 and 7.0
  • Next by Date: Re: A Question About Directive
  • Previous by thread: Re: Kolmogorov-Smirnov 2-sample test
  • Next by thread: Re: Kolmogorov-Smirnov 2-sample test