Re: Kolmogorov-Smirnov 2-sample test
- To: mathgroup at smc.vnet.net
- Subject: [mg111227] Re: Kolmogorov-Smirnov 2-sample test
- From: Ray Koopman <koopman at sfu.ca>
- Date: Sat, 24 Jul 2010 05:07:13 -0400 (EDT)
- References: <i2bt8f$b8g$1@smc.vnet.net>
On Jul 23, 4:09 am, Andy Ross <an... at wolfram.com> wrote: > > I've corrected the error in my code from before. The p-value > computation was giving low estimates because I was using RandomChoice > rather than RandomSample. I believe this should do the job (though > rather slowly). > > empiricalCDF[data_, x_] := Length[Select[data, # <= x &]]/Length[data] > > splitAtN1[udat_, n1_] := {udat[[1 ;; n1]], udat[[n1 + 1 ;; -1]]} > > KolmogorovSmirnov2Sample[data1_, data2_] := > Block[{sd1 = Sort[data1], sd2 = Sort[data2], e1, e2, > udat = Union[Flatten[{data1, data2}]], n1 = Length[data1], > n2 = Length[data2], T}, > e1 = empiricalCDF[sd1, #] & /@ udat; > e2 = empiricalCDF[sd2, #] & /@ udat; > T = Max[Abs[e1 - e2]]; > (Sqrt[(n1*n2)/(n1 + n2)]) T // N > ] > > KS2BootStrapPValue[data1_, data2_, T_, MCSamp_] := > Block[{n1 = Length[data1], udat = Join[data1, data2], dfts}, > dfts = ConstantArray[0, MCSamp]; > Do[ > dfts[[i]] = > KolmogorovSmirnov2Sample @@ splitAtN1[RandomSample[udat], n1] > , {i, MCSamp} > ]; > Length[Select[dfts, # >= T &]]/MCSamp // N > ] > > Example: > > data1 = {0.386653, 1.10925, 0.871822, -0.266199, 2.00516, -1.48574, > -0.68592, -0.0461418, -0.29906, 0.209381}; > > data2 = {-0.283594, -1.08097, 0.915052, 0.448915, -0.88062, -0.140511, > -0.0812646, -1.1592, 0.138245, -0.314907}; > > In[41]:= KolmogorovSmirnov2Sample[data1, data2] > > Out[41]= 0.67082 > > Using 1000 bootstrap samples... > > In[42]:= KS2BootStrapPValue[data1, data2, .67082, 1000] > > Out[42]= 0.791 > > -Andy The ugly procedural code in ks2 returns the exact p-value very quickly: AbsoluteTiming@ks2[data1,data2] {0.005842 Second,{30,0.78693}}