Re: Kolmogorov-Smirnov 2-sample test

• To: mathgroup at smc.vnet.net
• Subject: [mg111085] Re: Kolmogorov-Smirnov 2-sample test
• From: Darren Glosemeyer <darreng at wolfram.com>
• Date: Tue, 20 Jul 2010 03:41:19 -0400 (EDT)

```Aaron Bramson wrote:
> Hello Everybody,
>
> I would like to perform a 2-sample k-s test.  I've seen some posts on the
> archive about the one-sample goodness-of-fit version of the
> Kolgomorov-Smirnov test, but I'm interested in the 2-sample version.
>
> Here's a description of the method:
> http://en.wikipedia.org/wiki/Kolmogorov%E2%80%93Smirnov_test#Two-sample_Kolmogorov.E2.80.93Smirnov_test
>
> I think all the necessary components are available; e.g.
> Accumulate[BinCounts[_list_]] to get the ecdf of both datasets, abs, max of
> a list, etc.  But the data management is a bit above my current skill
> level.  Also, since all other software packages seem to include this test
> capability, I would be really surprised if there
> wasn't a package somewhere that included it by now, but I've searched a lot
> and can't find it.  Can anybody help me locate this this?
>
> Alternatively, would anybody like to work with me to build this in case it
> can't be found?
>
> Aaron
>
>
>

Hi Aaron,

Here is some code written by Andy Ross at Wolfram  for the two sample
Kolmogorov-Smirnov test. KolmogorovSmirnov2Sample computes the test
statistic, and KSBootstrapPValue provides a bootstrap estimate of the
p-value given the two data sets, the number of simulations for the
estimate and the test statistic.

In[1]:= empiricalCDF[data_, x_] := Length[Select[data, # <= x
&]]/Length[data]

In[2]:= KolmogorovSmirnov2Sample[data1_, data2_] :=
Block[{sd1 = Sort[data1], sd2 = Sort[data2], e1, e2,
udat = Union[Flatten[{data1, data2}]], n1 = Length[data1],
n2 = Length[data2], T},
e1 = empiricalCDF[sd1, #] & /@ udat;
e2 = empiricalCDF[sd2, #] & /@ udat;
T = Max[Abs[e1 - e2]];
(1/Sqrt[n1]) (Sqrt[(n1*n2)/(n1 + n2)]) T
]

In[3]:= KSBootstrapPValue[data1_, data2_, nSamp_, T_] :=
Block[{udat = Union[Flatten[{data1, data2}]], n1 = Length[data1],
n2 = Length[data2], d1Samples, d2Samples, Tdist},
d1Samples = RandomChoice[udat, {nSamp, n1}];
d2Samples = RandomChoice[udat, {nSamp, n2}];
Tdist =
KolmogorovSmirnov2Sample[#1, #2] &, {d1Samples, d2Samples}];
1 - empiricalCDF[Tdist, T] // N
]

(* create two data sets *)
In[4]:= data1 = RandomReal[NormalDistribution[], 15];

In[5]:= data2 = RandomReal[NormalDistribution[], 15];

(* compute the test statistic *)
In[6]:= stat = KolmogorovSmirnov2Sample[data1, data2] // N

Out[6]= 0.329983

(* estimate the p-value from 10000 bootstrap samples *)
In[7]:= KSBootstrapPValue[data1, data2, 10000, stat]

Out[7]= 0.0213

Darren Glosemeyer
Wolfram Research

```

• Prev by Date: Re: Transform differential equation by tranformation rule
• Next by Date: Re: Avoid the use of certain functions
• Previous by thread: Re: Kolmogorov-Smirnov 2-sample test
• Next by thread: Re: Kolmogorov-Smirnov 2-sample test