MathGroup Archive 2010

[Date Index] [Thread Index] [Author Index]

Search the Archive

Re: Kolmogorov-Smirnov 2-sample test

  • To: mathgroup at
  • Subject: [mg111119] Re: Kolmogorov-Smirnov 2-sample test
  • From: Bill Rowe <readnews at>
  • Date: Wed, 21 Jul 2010 07:11:30 -0400 (EDT)

On 7/20/10 at 3:41 AM, darreng at (Darren Glosemeyer) wrote:

>Here is some code written by Andy Ross at Wolfram  for the two
>sample Kolmogorov-Smirnov test. KolmogorovSmirnov2Sample computes
>the test statistic, and KSBootstrapPValue provides a bootstrap
>estimate of the p-value given the two data sets, the number of
>simulations for the estimate and the test statistic.

>In[1]:= empiricalCDF[data_, x_] := Length[Select[data, # <= x

>In[2]:= KolmogorovSmirnov2Sample[data1_, data2_] :=
>Block[{sd1 = Sort[data1], sd2 = Sort[data2], e1, e2,
>udat = Union[Flatten[{data1, data2}]], n1 = Length[data1],
>n2 = Length[data2], T},
>e1 = empiricalCDF[sd1, #] & /@ udat;
>e2 = empiricalCDF[sd2, #] & /@ udat;
>T = Max[Abs[e1 - e2]];
>(1/Sqrt[n1]) (Sqrt[(n1*n2)/(n1 + n2)]) T

After looking at your code above I realized I posted a very bad
solution to this problem. But, it looks to me like there is a
problem with this code. The returned result

(1/Sqrt[n1]) (Sqrt[(n1*n2)/(n1 + n2)]) T

seems to have a extra factor in it. Specifically 1/Sqrt[n1].
Since n1 is the number of samples in the first data set,
including this factor means you will get a different result by
interchanging the order of the arguments to the function when
the number of samples in each data set is different. Since the
KS statistic is based on the maximum difference between the
empirical CDFs, the order in which the data sets are used in the
function should not matter.

  • Prev by Date: Re: Scoping constructs Block, Module, ModuleBlock violate
  • Next by Date: Re: Very very basic question about Mathematica expressions
  • Previous by thread: Re: Kolmogorov-Smirnov 2-sample test
  • Next by thread: Re: Kolmogorov-Smirnov 2-sample test