       Re: Re: stats

• To: mathgroup at smc.vnet.net
• Subject: [mg39674] Re: Re: stats
• From: Dr Bob <drbob at bigfoot.com>
• Date: Sat, 1 Mar 2003 02:48:23 -0500 (EST)
• References: <200302280950.EAA02989@smc.vnet.net>
• Reply-to: drbob at bigfoot.com
• Sender: owner-wri-mathgroup at wolfram.com

```One might want to start at

http://www.itl.nist.gov/div898/handbook/eda/section3/eda35g.htm

Bobby

On Fri, 28 Feb 2003 04:50:04 -0500 (EST), Bill Rowe
<listuser at earthlink.net> wrote:

> On 2/27/03 at 12:27 AM, marc.noel at skynet.be (Marc Noël) wrote:
>
>
>> I would like to know if any one has written a small app that allows to
>> tell mathemetica to look ar a set of data and output an answer like
>> "with a degree of confidence of X% this data can be considered as
>> normaly or log normaly or student, ... distributed. If not, can
>> anybody tell me how to use the tools available to acheive that kind of
>> result.
>
> This is actually a rather difficult problem. None of the tools in the
> standard Mathematica distribution are designed to address this problem
> directly.
>
> There are two approaches to solving this problem. First, you design a
> test to compare your data to a specific distribution.  For example,
> consider a normal distribution. The skewness is 0 and the kurtosis is 3.
> If the kurtosis and skewness for your data set were significantly
> different, that would be evidence your data is non-normal. Confidence
> limits could be estimated from the sampling distribution of the kurtosis
> and skewness. Using statistics like kurtosis and skewness almost
> certainly isn't optimal particularly for small samples. The problem is
> these statistics involve high order moments and are strongly affected by
> outliers in small data sets.
>
> The other basic approach is to compute a statistic like the Kolmgorov-
> Smirnov statistic. While this statistic is much more robust against
> outliers, it is also much less efficient than a specific test tailored to
> a specific distribution. It is probably a better choice than kurtosis and
> skewness for small data sets. The distribution of the KS statistic is
> known allowing for estimation confidence limits.
>
>> In principle I think one should first look at the data distribution
>> than compare it to a standard (normal, log normal, ...)
>
> If I take "look" to mean plot the data distribution, then it is possible
> to do both of these steps at once. In fact, plotting the data
> distribution in an appropriate manner is probably far better than simply
> computing a test statistic and estimating confidence limits. The general
> idea is to construct a Q-Q plot of the data.
>
> Again using a normal distribution as an example your could do the
> following,
>
> d={First[#],Length[#]}&/@Split[Sort[data]];
> f=Rest[FoldList[Plus,0,Last/@d]]/(Length[data]+1);
> ListPlot[MapThread[{First[#1],Quantile[NormalDistribution[0,1],#2}]&,{d,f}]];
>
> What I've done here is compute the empirical cumulative distribution
> function for your data in f and plotted this against quantiles of the
> unit normal distribution. If the data is normally distributed, this
> should plot as a straight line. Significant deviations would be an
> indication of non-normality.
>
> Basically, the idea is to plot quantiles computed from your data against
> the expected quantiles of a given distribution. If the given distribution
> is a good model for the data, then the resulting plot should be a
> straight line.
>
> Assuming you lack any reason to select a particular distribution,
> plotting the data in this manner is probably always the best choice
> unless you have a too many data sets to make this managable. Even if you
> do have reason to select a given distribution plotting data in this
> manner is a good idea as a check on the data set.
>
> One last thought. While it is possible to base the choice of distribution
> on a test statistic such as the KS statistic and confidence limits, this
> should always be a method of last resort. It is far, far better to choose
> the distribution based on knowledge of the physical problem and its
> characteristics instead of value of some test statistic.
>
>

--
majort at cox-internet.com
Bobby R. Treat

```

• Prev by Date: Re: stats
• Next by Date: Output
• Previous by thread: Re: Re: stats
• Next by thread: Output