[Date Index]
[Thread Index]
[Author Index]
Re: Kolmogorov-Smirnov 2-sample test
*To*: mathgroup at smc.vnet.net
*Subject*: [mg111085] Re: Kolmogorov-Smirnov 2-sample test
*From*: Darren Glosemeyer <darreng at wolfram.com>
*Date*: Tue, 20 Jul 2010 03:41:19 -0400 (EDT)
Aaron Bramson wrote:
> Hello Everybody,
>
> I would like to perform a 2-sample k-s test. I've seen some posts on the
> archive about the one-sample goodness-of-fit version of the
> Kolgomorov-Smirnov test, but I'm interested in the 2-sample version.
>
> Here's a description of the method:
> http://en.wikipedia.org/wiki/Kolmogorov%E2%80%93Smirnov_test#Two-sample_Kolmogorov.E2.80.93Smirnov_test
>
> I think all the necessary components are available; e.g.
> Accumulate[BinCounts[_list_]] to get the ecdf of both datasets, abs, max of
> a list, etc. But the data management is a bit above my current skill
> level. Also, since all other software packages seem to include this test
> capability, I would be really surprised if there
> wasn't a package somewhere that included it by now, but I've searched a lot
> and can't find it. Can anybody help me locate this this?
>
> Alternatively, would anybody like to work with me to build this in case it
> can't be found?
>
> Thanks in Advance,
> Aaron
>
>
>
Hi Aaron,
Here is some code written by Andy Ross at Wolfram for the two sample
Kolmogorov-Smirnov test. KolmogorovSmirnov2Sample computes the test
statistic, and KSBootstrapPValue provides a bootstrap estimate of the
p-value given the two data sets, the number of simulations for the
estimate and the test statistic.
In[1]:= empiricalCDF[data_, x_] := Length[Select[data, # <= x
&]]/Length[data]
In[2]:= KolmogorovSmirnov2Sample[data1_, data2_] :=
Block[{sd1 = Sort[data1], sd2 = Sort[data2], e1, e2,
udat = Union[Flatten[{data1, data2}]], n1 = Length[data1],
n2 = Length[data2], T},
e1 = empiricalCDF[sd1, #] & /@ udat;
e2 = empiricalCDF[sd2, #] & /@ udat;
T = Max[Abs[e1 - e2]];
(1/Sqrt[n1]) (Sqrt[(n1*n2)/(n1 + n2)]) T
]
In[3]:= KSBootstrapPValue[data1_, data2_, nSamp_, T_] :=
Block[{udat = Union[Flatten[{data1, data2}]], n1 = Length[data1],
n2 = Length[data2], d1Samples, d2Samples, Tdist},
d1Samples = RandomChoice[udat, {nSamp, n1}];
d2Samples = RandomChoice[udat, {nSamp, n2}];
Tdist =
MapThread[
KolmogorovSmirnov2Sample[#1, #2] &, {d1Samples, d2Samples}];
1 - empiricalCDF[Tdist, T] // N
]
(* create two data sets *)
In[4]:= data1 = RandomReal[NormalDistribution[], 15];
In[5]:= data2 = RandomReal[NormalDistribution[], 15];
(* compute the test statistic *)
In[6]:= stat = KolmogorovSmirnov2Sample[data1, data2] // N
Out[6]= 0.329983
(* estimate the p-value from 10000 bootstrap samples *)
In[7]:= KSBootstrapPValue[data1, data2, 10000, stat]
Out[7]= 0.0213
Darren Glosemeyer
Wolfram Research
Prev by Date:
**Re: Transform differential equation by tranformation rule**
Next by Date:
**Re: Avoid the use of certain functions**
Previous by thread:
**Re: Kolmogorov-Smirnov 2-sample test**
Next by thread:
**Re: Kolmogorov-Smirnov 2-sample test**
| |