       Re: Mathematica function for Chi Square Test...

• To: mathgroup at smc.vnet.net
• Subject: [mg74495] Re: [mg74300] Mathematica function for Chi Square Test...
• From: Darren Glosemeyer <darreng at wolfram.com>
• Date: Fri, 23 Mar 2007 19:05:30 -0500 (EST)
• References: <200703170710.CAA19110@smc.vnet.net>

```Richard Palmer wrote:
> ... anybody have a reference / already done / ... piece of code to do
> a Chi Square test?  I'm matching sample vectors to non-standard
> probability distributions.
>
>

The basic idea is to compare frequency of data within bins with expected
frequency within those same bins. The test statistic of interest is

Total[(observedcounts - expectedcounts)^2/expectedcounts]

If the expected counts are not based on a parametric distribution, this
statistic follows a chi square distribution with
Length[observedcounts]-1 degrees of freedom. If the expected counts are
based on fitting to a parametric distribution (and some regularity
conditions apply), there are Length[observedcounts]-NumberOfParameters-1
degrees of freedom.

Starting with lists of observed and expected counts, the process is
pretty straightforward. In the parametric case, there is a question of
how best to subdivide the region into bins and how many bins.
Additionally, the parameters need to be estimated (typically by
minimizing the chi-square statistic above with respect to the parameters).

The following uses equal width bins as a demonstration (and assumes
necessary regularity conditions hold). There may be reason to use
unequal bins in your case, but I do not know the details for suggested
binning of data.

First simulate some data:

In:= <<Statistics`

In:= n = 20;

In:= data = RandomArray[BetaDistribution[2, 5], n];

The observed counts can be obtained via BinCounts for evenly spaced bins
or RangeCounts for unequally spaced bins.

In:= bins = BinCounts[data, {0, 1, .1}]

Out= {3, 3, 5, 4, 4, 1, 0, 0, 0, 0}

The ith expected count is n*pi where pi is the probability of being in
the ith bin.

In:= expect =
n Table[CDF[BetaDistribution[a, b], x] -
CDF[BetaDistribution[a, b], x - .1], {x, .1, 1, .1}];

expr will be our chi-square test statistic. Parameter estimates are
obtained by minimizing it with respect to the parameters.

In:= expr = Total[(bins - expect)^2/expect];

In:= res = NMinimize[{expr, a > 0 && b > 0}, {a, b}]

Out= {2.66879, {a -> 2.0491, b -> 5.12127}}

Here is the p-value.

In:= 1 - CDF[ChiSquareDistribution[n - 2 - 1], res[]]

Out= 0.99997

Small p-values indicate a lack of fit. This p-value is large indicating
a good fit as would be expected since the data were generated from a
beta distribution and they are being compared to a beta distribution.

Darren Glosemeyer
Wolfram Research

```

• Prev by Date: Which Mathematica product should I get?
• Next by Date: Re: Mathematica 4.0
• Previous by thread: Mathematica function for Chi Square Test...
• Next by thread: How to choose real positive solutions only?