Re: fit a BinomialDistribution to exptl data?
- To: mathgroup at smc.vnet.net
- Subject: [mg80473] Re: [mg80415] fit a BinomialDistribution to exptl data?
- From: Darren Glosemeyer <darreng at wolfram.com>
- Date: Thu, 23 Aug 2007 01:08:35 -0400 (EDT)
- References: <200708220838.EAA08485@smc.vnet.net>
Gordon Robertson wrote: > Given a list of data values, or a list of x-y data points for > plotting the data as an empirical distribution function, how can I > fit a BinomialDistribution to the data? The help documentation for > FindFit shows examples in which the user indicates which function > should be fit (e.g. FindFit[data, a x Log[b + c x], {a, b, c}, x]), > and I've been unable to find an example in which a statistical > distribution is being fit to data. Mathematica complains when I try the > following with an xy list of data that specified an EDF: FindFit > [xyvals, CDF[BinomialDistribution[n, pp], k], {n, pp}, k]. > > G > -- > Gordon Robertson > Canada's Michael Smith Genome Sciences Centre > Vancouver BC Canada > > Non-default starting values are needed. By default, FindFit will use a starting value of 1 for each parameter, which will be problematic in this case. The starting value for n should be at least as large as the largest binomial in the sample, and the value for pp should be strictly between 0 and 1. Here is an example. In[1]:= binom = RandomInteger[BinomialDistribution[20, .4], 10] Out[1]= {10, 7, 5, 9, 8, 12, 7, 7, 10, 9} In[2]:= edf = Sort[Tally[binom]]; In[3]:= edf[[All, 2]] = Accumulate[edf[[All, 2]]]/Length[binom]; In[4]:= FindFit[edf, CDF[BinomialDistribution[n, pp], k], {{n, Max[binom] + 1}, {pp, .5}}, k] Out[4]= {n -> 17.3082, pp -> 0.4773} Floor[n] is the value of n to take. Note that FindFit gives a least squares fit of the edf to the cdf. Alternatively, a maximum likelihood estimate of the parameters can be obtained by maximizing the log likelihood function (the sum of the logs of the pdf with unknown parameters evaluated at the data points) with respect to the parameters. In[5]:= loglike = PowerExpand[ Total[Log[Map[PDF[BinomialDistribution[n, pp], #] &, binom]]]]; Constraints should be used to keep the parameters in the feasible range. In[6]:= FindMaximum[{loglike, n >= Max[binom] && 0 < pp < 1}, {n, Max[binom] + 1}, {pp, .5}] Out[6]= {-20.6326, {n -> 14.9309, pp -> 0.56259}} Darren Glosemeyer Wolfram Research
- References:
- fit a BinomialDistribution to exptl data?
- From: Gordon Robertson <agrobertson@telus.net>
- fit a BinomialDistribution to exptl data?