MathGroup Archive: October 1997 [00329]

[Date Index] [Thread Index] [Author Index]

RE:Kolmogorov-Smirnov statistic

To: mathgroup at smc.vnet.net
Subject: [mg9295] RE:[mg8992] Kolmogorov-Smirnov statistic
From: tgarza at mail.internet.com.mx
Date: Mon, 27 Oct 1997 02:47:05 -0500
Sender: owner-wri-mathgroup at wolfram.com

Robert wrote:

>I have a list of data points and I wish to use
>the Kolmogorov-Smirnov statistic to decide whether
>they look gaussian. Has anyone done this in Mathematica?

Robert,

There should be no problem in computing the Kolmogorov-Smirnov (K-S) =
statistic. If you look at the book by Morris DeGroot (Probability and =
Statistics, 2nd ed., 1986, Addison-Wesley), in Section 9.6 you'll find
= the expression you want to calculate (formula (6)).=20

However, I've found that it is much better to use the essential idea in
= K-S theory, namely finding the maximum differences between the
original = distribution (e.g. gaussian) and the cdf of a sample, but to
obtain the = critical values through simulation instead of using the
large sample = approximation derived by the great masters back in the
30's.

Fix your sample size, say n. Then obtain n random values from the =
gaussian (I'd rather call it normal), using the standard =
Random[NormalDistribution] in the Statistics package, and calculate the
= corresponding empirical distribution function (edf). Determine the =
maximum distance, say d, between the edf and the normal distribution =
function.

Repeat this procedure a large number of times, 1000 or 10000 if you have
= the time and patience. You'll end up with 1000 (or 10000) values d. =
Calculate the empirical distribution function of these values d. This =
edf gives you the (estimated) distribution functions of the maximum =
differences one is likely to obtain in sampling from a normal =
distribution (or any other, for that matter). Choose the probability
you = like (level of significance, the statisticians call it), and that
gives = you the value d for your test.

I've found that this procedure is more precise that the K-S test, in the
= sense that it leads to stricter criteria for a given sample size.
I.e., = it gives smaller values for the maximum differences than the
K-S does.=20

Good luck,

Tomas Garza
Cerrada de Cortes 31
Mexico 01040, D.F. Mexico

Prev by Date: faster ToCycles[]

Next by Date: Utilization Of System Resources

Previous by thread: Kolmogorov-Smirnov statistic

Next by thread: Re: Problem with simultaneous equations with several variables - population genetics