MathGroup Archive: March 2007 [00754]

[Date Index] [Thread Index] [Author Index]

Re: Mathematica function for Chi Square Test...

To: mathgroup at smc.vnet.net
Subject: [mg74495] Re: [mg74300] Mathematica function for Chi Square Test...
From: Darren Glosemeyer <darreng at wolfram.com>
Date: Fri, 23 Mar 2007 19:05:30 -0500 (EST)
References: <200703170710.CAA19110@smc.vnet.net>

Richard Palmer wrote:
> ... anybody have a reference / already done / ... piece of code to do
> a Chi Square test?  I'm matching sample vectors to non-standard
> probability distributions.
>
>   

The basic idea is to compare frequency of data within bins with expected 
frequency within those same bins. The test statistic of interest is

Total[(observedcounts - expectedcounts)^2/expectedcounts]

If the expected counts are not based on a parametric distribution, this 
statistic follows a chi square distribution with 
Length[observedcounts]-1 degrees of freedom. If the expected counts are 
based on fitting to a parametric distribution (and some regularity 
conditions apply), there are Length[observedcounts]-NumberOfParameters-1 
degrees of freedom.

Starting with lists of observed and expected counts, the process is 
pretty straightforward. In the parametric case, there is a question of 
how best to subdivide the region into bins and how many bins. 
Additionally, the parameters need to be estimated (typically by 
minimizing the chi-square statistic above with respect to the parameters).

The following uses equal width bins as a demonstration (and assumes 
necessary regularity conditions hold). There may be reason to use 
unequal bins in your case, but I do not know the details for suggested 
binning of data.

First simulate some data:

In[1]:= <<Statistics`

In[2]:= n = 20;

In[3]:= data = RandomArray[BetaDistribution[2, 5], n];

The observed counts can be obtained via BinCounts for evenly spaced bins 
or RangeCounts for unequally spaced bins.

In[4]:= bins = BinCounts[data, {0, 1, .1}]

Out[4]= {3, 3, 5, 4, 4, 1, 0, 0, 0, 0}

The ith expected count is n*pi where pi is the probability of being in 
the ith bin.

In[5]:= expect =
          n Table[CDF[BetaDistribution[a, b], x] -
             CDF[BetaDistribution[a, b], x - .1], {x, .1, 1, .1}];

expr will be our chi-square test statistic. Parameter estimates are 
obtained by minimizing it with respect to the parameters.

In[6]:= expr = Total[(bins - expect)^2/expect];

In[7]:= res = NMinimize[{expr, a > 0 && b > 0}, {a, b}]

Out[7]= {2.66879, {a -> 2.0491, b -> 5.12127}}

Here is the p-value.

In[8]:= 1 - CDF[ChiSquareDistribution[n - 2 - 1], res[[1]]]

Out[8]= 0.99997

Small p-values indicate a lack of fit. This p-value is large indicating 
a good fit as would be expected since the data were generated from a 
beta distribution and they are being compared to a beta distribution.

Darren Glosemeyer
Wolfram Research

References:
- Mathematica function for Chi Square Test...
  - From: "Richard Palmer" <rhpalmer@gmail.com>

Prev by Date: Which Mathematica product should I get?

Next by Date: Re: Mathematica 4.0

Previous by thread: Mathematica function for Chi Square Test...

Next by thread: How to choose real positive solutions only?