RE:Kolmogorov-Smirnov statistic
- To: mathgroup at smc.vnet.net
- Subject: [mg9295] RE:[mg8992] Kolmogorov-Smirnov statistic
- From: tgarza at mail.internet.com.mx
- Date: Mon, 27 Oct 1997 02:47:05 -0500
- Sender: owner-wri-mathgroup at wolfram.com
Robert wrote: >I have a list of data points and I wish to use >the Kolmogorov-Smirnov statistic to decide whether >they look gaussian. Has anyone done this in Mathematica? Robert, There should be no problem in computing the Kolmogorov-Smirnov (K-S) = statistic. If you look at the book by Morris DeGroot (Probability and = Statistics, 2nd ed., 1986, Addison-Wesley), in Section 9.6 you'll find = the expression you want to calculate (formula (6)).=20 However, I've found that it is much better to use the essential idea in = K-S theory, namely finding the maximum differences between the original = distribution (e.g. gaussian) and the cdf of a sample, but to obtain the = critical values through simulation instead of using the large sample = approximation derived by the great masters back in the 30's. Fix your sample size, say n. Then obtain n random values from the = gaussian (I'd rather call it normal), using the standard = Random[NormalDistribution] in the Statistics package, and calculate the = corresponding empirical distribution function (edf). Determine the = maximum distance, say d, between the edf and the normal distribution = function. Repeat this procedure a large number of times, 1000 or 10000 if you have = the time and patience. You'll end up with 1000 (or 10000) values d. = Calculate the empirical distribution function of these values d. This = edf gives you the (estimated) distribution functions of the maximum = differences one is likely to obtain in sampling from a normal = distribution (or any other, for that matter). Choose the probability you = like (level of significance, the statisticians call it), and that gives = you the value d for your test. I've found that this procedure is more precise that the K-S test, in the = sense that it leads to stricter criteria for a given sample size. I.e., = it gives smaller values for the maximum differences than the K-S does.=20 Good luck, Tomas Garza Cerrada de Cortes 31 Mexico 01040, D.F. Mexico