Re: Kolmogorov-Smirnov 2-sample test
- To: mathgroup at smc.vnet.net
- Subject: [mg111088] Re: Kolmogorov-Smirnov 2-sample test
- From: Bill Rowe <readnews at sbcglobal.net>
- Date: Tue, 20 Jul 2010 03:41:51 -0400 (EDT)
On 7/19/10 at 2:11 AM, aaronbramson at gmail.com (Aaron Bramson) wrote: >I would like to perform a 2-sample k-s test. I've seen some posts >on the archive about the one-sample goodness-of-fit version of the >Kolgomorov-Smirnov test, but I'm interested in the 2-sample version. >Here's a description of the method: >http://en.wikipedia.org/wiki/Kolmogorov%E2%80%93Smirnov_test#Two- >sample_Kolmogorov.E2.80. 93Smirnov_test >I think all the necessary components are available; e.g. >Accumulate[BinCounts[_list_]] to get the ecdf of both datasets, abs, >max of a list, etc. But the data management is a bit above my >current skill level. Also, since all other software packages seem >to include this test capability, I would be really surprised if >there wasn't a package somewhere that included it by now, but I've >searched a lot and can't find it. Can anybody help me locate this >this? >Alternatively, would anybody like to work with me to build this in >case it can't be found? It is simple to create a function that will do what is needed. For example, ksTwoSampleTest[xdata_, ydata_] := Module[{nx, ny, k}, {nx, ny} = Length /@ {xdata, ydata}; k = Max[nx, ny]; Sqrt[nx ny/(nx + ny)] Max@ Table[Abs[Quantile[xdata, x/k] - Quantile[ydata, x/k]], {x, k}]] Note, while this is a simple implementation it may not be optimal for large data sets. My *guess* is by using Quantile and not pre-sorting the data, there is more work being done by this code than is really needed. i suspect that the approach I've used here has a complexity of order n^2 which should't be a problem for modest data sets but will certainly be an issue for large data sets.