Re: Mathematica function for Chi Square Test...

*To*: mathgroup at smc.vnet.net*Subject*: [mg74495] Re: [mg74300] Mathematica function for Chi Square Test...*From*: Darren Glosemeyer <darreng at wolfram.com>*Date*: Fri, 23 Mar 2007 19:05:30 -0500 (EST)*References*: <200703170710.CAA19110@smc.vnet.net>

Richard Palmer wrote: > ... anybody have a reference / already done / ... piece of code to do > a Chi Square test? I'm matching sample vectors to non-standard > probability distributions. > > The basic idea is to compare frequency of data within bins with expected frequency within those same bins. The test statistic of interest is Total[(observedcounts - expectedcounts)^2/expectedcounts] If the expected counts are not based on a parametric distribution, this statistic follows a chi square distribution with Length[observedcounts]-1 degrees of freedom. If the expected counts are based on fitting to a parametric distribution (and some regularity conditions apply), there are Length[observedcounts]-NumberOfParameters-1 degrees of freedom. Starting with lists of observed and expected counts, the process is pretty straightforward. In the parametric case, there is a question of how best to subdivide the region into bins and how many bins. Additionally, the parameters need to be estimated (typically by minimizing the chi-square statistic above with respect to the parameters). The following uses equal width bins as a demonstration (and assumes necessary regularity conditions hold). There may be reason to use unequal bins in your case, but I do not know the details for suggested binning of data. First simulate some data: In[1]:= <<Statistics` In[2]:= n = 20; In[3]:= data = RandomArray[BetaDistribution[2, 5], n]; The observed counts can be obtained via BinCounts for evenly spaced bins or RangeCounts for unequally spaced bins. In[4]:= bins = BinCounts[data, {0, 1, .1}] Out[4]= {3, 3, 5, 4, 4, 1, 0, 0, 0, 0} The ith expected count is n*pi where pi is the probability of being in the ith bin. In[5]:= expect = n Table[CDF[BetaDistribution[a, b], x] - CDF[BetaDistribution[a, b], x - .1], {x, .1, 1, .1}]; expr will be our chi-square test statistic. Parameter estimates are obtained by minimizing it with respect to the parameters. In[6]:= expr = Total[(bins - expect)^2/expect]; In[7]:= res = NMinimize[{expr, a > 0 && b > 0}, {a, b}] Out[7]= {2.66879, {a -> 2.0491, b -> 5.12127}} Here is the p-value. In[8]:= 1 - CDF[ChiSquareDistribution[n - 2 - 1], res[[1]]] Out[8]= 0.99997 Small p-values indicate a lack of fit. This p-value is large indicating a good fit as would be expected since the data were generated from a beta distribution and they are being compared to a beta distribution. Darren Glosemeyer Wolfram Research

**References**:**Mathematica function for Chi Square Test...***From:*"Richard Palmer" <rhpalmer@gmail.com>