[Date Index]
[Thread Index]
[Author Index]
RE:Kolmogorov-Smirnov statistic
*To*: mathgroup at smc.vnet.net
*Subject*: [mg9295] RE:[mg8992] Kolmogorov-Smirnov statistic
*From*: tgarza at mail.internet.com.mx
*Date*: Mon, 27 Oct 1997 02:47:05 -0500
*Sender*: owner-wri-mathgroup at wolfram.com
Robert wrote:
>I have a list of data points and I wish to use
>the Kolmogorov-Smirnov statistic to decide whether
>they look gaussian. Has anyone done this in Mathematica?
Robert,
There should be no problem in computing the Kolmogorov-Smirnov (K-S) =
statistic. If you look at the book by Morris DeGroot (Probability and =
Statistics, 2nd ed., 1986, Addison-Wesley), in Section 9.6 you'll find
= the expression you want to calculate (formula (6)).=20
However, I've found that it is much better to use the essential idea in
= K-S theory, namely finding the maximum differences between the
original = distribution (e.g. gaussian) and the cdf of a sample, but to
obtain the = critical values through simulation instead of using the
large sample = approximation derived by the great masters back in the
30's.
Fix your sample size, say n. Then obtain n random values from the =
gaussian (I'd rather call it normal), using the standard =
Random[NormalDistribution] in the Statistics package, and calculate the
= corresponding empirical distribution function (edf). Determine the =
maximum distance, say d, between the edf and the normal distribution =
function.
Repeat this procedure a large number of times, 1000 or 10000 if you have
= the time and patience. You'll end up with 1000 (or 10000) values d. =
Calculate the empirical distribution function of these values d. This =
edf gives you the (estimated) distribution functions of the maximum =
differences one is likely to obtain in sampling from a normal =
distribution (or any other, for that matter). Choose the probability
you = like (level of significance, the statisticians call it), and that
gives = you the value d for your test.
I've found that this procedure is more precise that the K-S test, in the
= sense that it leads to stricter criteria for a given sample size.
I.e., = it gives smaller values for the maximum differences than the
K-S does.=20
Good luck,
Tomas Garza
Cerrada de Cortes 31
Mexico 01040, D.F. Mexico
Prev by Date:
**faster ToCycles[]**
Next by Date:
**Utilization Of System Resources**
Previous by thread:
**Kolmogorov-Smirnov statistic**
Next by thread:
**Re: Problem with simultaneous equations with several variables - population genetics**
| |