[Date Index]
[Thread Index]
[Author Index]
Re: Mathematica function for Chi Square Test...
*To*: mathgroup at smc.vnet.net
*Subject*: [mg74495] Re: [mg74300] Mathematica function for Chi Square Test...
*From*: Darren Glosemeyer <darreng at wolfram.com>
*Date*: Fri, 23 Mar 2007 19:05:30 -0500 (EST)
*References*: <200703170710.CAA19110@smc.vnet.net>
Richard Palmer wrote:
> ... anybody have a reference / already done / ... piece of code to do
> a Chi Square test? I'm matching sample vectors to non-standard
> probability distributions.
>
>
The basic idea is to compare frequency of data within bins with expected
frequency within those same bins. The test statistic of interest is
Total[(observedcounts - expectedcounts)^2/expectedcounts]
If the expected counts are not based on a parametric distribution, this
statistic follows a chi square distribution with
Length[observedcounts]-1 degrees of freedom. If the expected counts are
based on fitting to a parametric distribution (and some regularity
conditions apply), there are Length[observedcounts]-NumberOfParameters-1
degrees of freedom.
Starting with lists of observed and expected counts, the process is
pretty straightforward. In the parametric case, there is a question of
how best to subdivide the region into bins and how many bins.
Additionally, the parameters need to be estimated (typically by
minimizing the chi-square statistic above with respect to the parameters).
The following uses equal width bins as a demonstration (and assumes
necessary regularity conditions hold). There may be reason to use
unequal bins in your case, but I do not know the details for suggested
binning of data.
First simulate some data:
In[1]:= <<Statistics`
In[2]:= n = 20;
In[3]:= data = RandomArray[BetaDistribution[2, 5], n];
The observed counts can be obtained via BinCounts for evenly spaced bins
or RangeCounts for unequally spaced bins.
In[4]:= bins = BinCounts[data, {0, 1, .1}]
Out[4]= {3, 3, 5, 4, 4, 1, 0, 0, 0, 0}
The ith expected count is n*pi where pi is the probability of being in
the ith bin.
In[5]:= expect =
n Table[CDF[BetaDistribution[a, b], x] -
CDF[BetaDistribution[a, b], x - .1], {x, .1, 1, .1}];
expr will be our chi-square test statistic. Parameter estimates are
obtained by minimizing it with respect to the parameters.
In[6]:= expr = Total[(bins - expect)^2/expect];
In[7]:= res = NMinimize[{expr, a > 0 && b > 0}, {a, b}]
Out[7]= {2.66879, {a -> 2.0491, b -> 5.12127}}
Here is the p-value.
In[8]:= 1 - CDF[ChiSquareDistribution[n - 2 - 1], res[[1]]]
Out[8]= 0.99997
Small p-values indicate a lack of fit. This p-value is large indicating
a good fit as would be expected since the data were generated from a
beta distribution and they are being compared to a beta distribution.
Darren Glosemeyer
Wolfram Research
Prev by Date:
**Which Mathematica product should I get?**
Next by Date:
**Re: Mathematica 4.0**
Previous by thread:
**Mathematica function for Chi Square Test...**
Next by thread:
**How to choose real positive solutions only?**
| |