Re: fit a BinomialDistribution to exptl data?
- To: mathgroup at smc.vnet.net
- Subject: [mg80692] Re: fit a BinomialDistribution to exptl data?
- From: Ray Koopman <koopman at sfu.ca>
- Date: Wed, 29 Aug 2007 04:14:18 -0400 (EDT)
- References: <fagso0$8ad$1@smc.vnet.net>
On Aug 22, 1:39 am, Gordon Robertson <agrobert... at telus.net> wrote:
> Given a list of data values, or a list of x-y data points for
> plotting the data as an empirical distribution function, how can I
> fit a BinomialDistribution to the data? The help documentation for
> FindFit shows examples in which the user indicates which function
> should be fit (e.g. FindFit[data, a x Log[b + c x], {a, b, c}, x]),
> and I've been unable to find an example in which a statistical
> distribution is being fit to data. Mathematica complains when I try the
> following with an xy list of data that specified an EDF: FindFit
> [xyvals, CDF[BinomialDistribution[n, pp], k], {n, pp}, k].
>
> G
> --
> Gordon Robertson
> Canada's Michael Smith Genome Sciences Centre
> Vancouver BC Canada
For fixed n, the maximum likelihood estimate of p is m/n, where m is
the sample mean. Then walk n up from the sample maximum until the
likelihood is maximized.
{n = 20, p = .2}; (* true parameter values *)
L = 200; (* sample size *)
x = Total@UnitStep[p - Table[Random[],{n},{L}]] (* sample data *)
{5,3,4,5,4,3,4,4,5,3,1,4,5,3,2,4,4,5,3,6,7,2,3,4,3,5,2,5,3,5,7,4,5,4,
2,6,3,6,5,3,2,5,6,5,2,3,6,1,5,10,3,2,1,3,4,4,2,6,6,0,5,6,4,3,5,4,2,
4,4,1,5,6,2,4,5,4,2,3,6,2,2,3,3,2,3,6,1,6,6,5,6,3,3,6,2,7,2,5,3,5,2,
2,2,5,4,5,6,6,3,3,5,5,6,4,4,2,7,5,4,3,4,3,5,1,6,5,7,6,5,3,4,6,1,1,7,
3,2,6,5,2,8,4,4,5,7,2,6,3,6,5,4,4,1,5,2,2,4,1,8,6,6,4,5,3,6,3,3,7,1,
2,5,3,6,9,6,4,2,6,2,6,6,4,2,5,2,2,7,1,5,6,4,4,1,6,4,3,3,8,3,6}
N@{m = Mean@x, v = Variance@x} (* sample mean & variance *)
{4.08, 3.3001}
N@{n1 = m^2/(m-v), p1 = (m-v)/m} (* moment-based estimates *)
{21.3443, 0.191152}
GML[n_] := Exp[m*Log[m/n] + (n-m)Log[1-m/n] + Tr@Log@Binomial[n,x]/L]
(* geometric mean likelihood, maximized w.r.t. p *)
{n2 = NestWhile[{#[[1]]+1,GML[#[[1]]+1]}&, {Max@x,GML@Max@x},
#1[[2]] < #2[[2]]&, 2][[1]] - 1,
p2 = N[m/n2]} (* maximum likelihood estimates *)
{20, 0.204}
ListPlot[Table[{n,GML[n]},{n,Max[x,Floor@n2-5],n2+5}],
PlotRange->All, Frame->True, Axes->None]