Re: fit a BinomialDistribution to exptl data?

• To: mathgroup at smc.vnet.net
• Subject: [mg80473] Re: [mg80415] fit a BinomialDistribution to exptl data?
• From: Darren Glosemeyer <darreng at wolfram.com>
• Date: Thu, 23 Aug 2007 01:08:35 -0400 (EDT)
• References: <200708220838.EAA08485@smc.vnet.net>

```Gordon Robertson wrote:
> Given a list of data values, or a list of x-y data points for
> plotting the data as an empirical distribution function, how can I
> fit a BinomialDistribution to the data? The help documentation for
> FindFit shows examples in which the user indicates which function
> should be fit (e.g. FindFit[data, a x Log[b + c x], {a, b, c}, x]),
> and I've been unable to find an example in which a statistical
> distribution is being fit to data. Mathematica complains when I try the
> following with an xy list of data that specified an EDF: FindFit
> [xyvals, CDF[BinomialDistribution[n, pp], k], {n, pp}, k].
>
> G
> --
> Gordon Robertson
> Canada's Michael Smith Genome Sciences Centre
> Vancouver BC Canada
>
>

Non-default starting values are needed. By default, FindFit will use a
starting value of 1 for each parameter, which will be problematic in
this case.  The starting value for n should be at least as large as the
largest binomial in the sample, and the value for pp should be strictly
between 0 and 1. Here is an example.

In[1]:= binom = RandomInteger[BinomialDistribution[20, .4], 10]

Out[1]= {10, 7, 5, 9, 8, 12, 7, 7, 10, 9}

In[2]:= edf = Sort[Tally[binom]];

In[3]:= edf[[All, 2]] = Accumulate[edf[[All, 2]]]/Length[binom];

In[4]:= FindFit[edf,
CDF[BinomialDistribution[n, pp], k], {{n, Max[binom] + 1}, {pp,
.5}},
k]

Out[4]= {n -> 17.3082, pp -> 0.4773}

Floor[n] is the value of n to take.

Note that FindFit gives a least squares fit of the edf to the cdf.
Alternatively, a maximum likelihood estimate of the parameters can be
obtained by maximizing the log likelihood function (the sum of the logs
of the pdf with unknown parameters evaluated at the data points) with
respect to the parameters.

In[5]:= loglike =
PowerExpand[
Total[Log[Map[PDF[BinomialDistribution[n, pp], #] &, binom]]]];

Constraints should be used to keep the parameters in the feasible range.

In[6]:= FindMaximum[{loglike, n >= Max[binom] && 0 < pp < 1}, {n,
Max[binom] + 1}, {pp, .5}]

Out[6]= {-20.6326, {n -> 14.9309, pp -> 0.56259}}

Darren Glosemeyer
Wolfram Research

```

• Prev by Date: Re: REQ: Kuratowski graphs on Möbius band and torus
• Next by Date: Re: Unit testing in Mathematica or Wolfram Workbench
• Previous by thread: fit a BinomialDistribution to exptl data?
• Next by thread: Re: Re: fit a BinomialDistribution to exptl data?