Re: fit a BinomialDistribution to exptl data?

*To*: mathgroup at smc.vnet.net*Subject*: [mg80692] Re: fit a BinomialDistribution to exptl data?*From*: Ray Koopman <koopman at sfu.ca>*Date*: Wed, 29 Aug 2007 04:14:18 -0400 (EDT)*References*: <fagso0$8ad$1@smc.vnet.net>

On Aug 22, 1:39 am, Gordon Robertson <agrobert... at telus.net> wrote: > Given a list of data values, or a list of x-y data points for > plotting the data as an empirical distribution function, how can I > fit a BinomialDistribution to the data? The help documentation for > FindFit shows examples in which the user indicates which function > should be fit (e.g. FindFit[data, a x Log[b + c x], {a, b, c}, x]), > and I've been unable to find an example in which a statistical > distribution is being fit to data. Mathematica complains when I try the > following with an xy list of data that specified an EDF: FindFit > [xyvals, CDF[BinomialDistribution[n, pp], k], {n, pp}, k]. > > G > -- > Gordon Robertson > Canada's Michael Smith Genome Sciences Centre > Vancouver BC Canada For fixed n, the maximum likelihood estimate of p is m/n, where m is the sample mean. Then walk n up from the sample maximum until the likelihood is maximized. {n = 20, p = .2}; (* true parameter values *) L = 200; (* sample size *) x = Total@UnitStep[p - Table[Random[],{n},{L}]] (* sample data *) {5,3,4,5,4,3,4,4,5,3,1,4,5,3,2,4,4,5,3,6,7,2,3,4,3,5,2,5,3,5,7,4,5,4, 2,6,3,6,5,3,2,5,6,5,2,3,6,1,5,10,3,2,1,3,4,4,2,6,6,0,5,6,4,3,5,4,2, 4,4,1,5,6,2,4,5,4,2,3,6,2,2,3,3,2,3,6,1,6,6,5,6,3,3,6,2,7,2,5,3,5,2, 2,2,5,4,5,6,6,3,3,5,5,6,4,4,2,7,5,4,3,4,3,5,1,6,5,7,6,5,3,4,6,1,1,7, 3,2,6,5,2,8,4,4,5,7,2,6,3,6,5,4,4,1,5,2,2,4,1,8,6,6,4,5,3,6,3,3,7,1, 2,5,3,6,9,6,4,2,6,2,6,6,4,2,5,2,2,7,1,5,6,4,4,1,6,4,3,3,8,3,6} N@{m = Mean@x, v = Variance@x} (* sample mean & variance *) {4.08, 3.3001} N@{n1 = m^2/(m-v), p1 = (m-v)/m} (* moment-based estimates *) {21.3443, 0.191152} GML[n_] := Exp[m*Log[m/n] + (n-m)Log[1-m/n] + Tr@Log@Binomial[n,x]/L] (* geometric mean likelihood, maximized w.r.t. p *) {n2 = NestWhile[{#[[1]]+1,GML[#[[1]]+1]}&, {Max@x,GML@Max@x}, #1[[2]] < #2[[2]]&, 2][[1]] - 1, p2 = N[m/n2]} (* maximum likelihood estimates *) {20, 0.204} ListPlot[Table[{n,GML[n]},{n,Max[x,Floor@n2-5],n2+5}], PlotRange->All, Frame->True, Axes->None]