MathGroup Archive 2007

[Date Index] [Thread Index] [Author Index]

Search the Archive

Re: fit a BinomialDistribution to exptl data?

  • To: mathgroup at smc.vnet.net
  • Subject: [mg80473] Re: [mg80415] fit a BinomialDistribution to exptl data?
  • From: Darren Glosemeyer <darreng at wolfram.com>
  • Date: Thu, 23 Aug 2007 01:08:35 -0400 (EDT)
  • References: <200708220838.EAA08485@smc.vnet.net>

Gordon Robertson wrote:
> Given a list of data values, or a list of x-y data points for  
> plotting the data as an empirical distribution function, how can I  
> fit a BinomialDistribution to the data? The help documentation for  
> FindFit shows examples in which the user indicates which function  
> should be fit (e.g. FindFit[data, a x Log[b + c x], {a, b, c}, x]),  
> and I've been unable to find an example in which a statistical  
> distribution is being fit to data. Mathematica complains when I try the  
> following with an xy list of data that specified an EDF: FindFit 
> [xyvals, CDF[BinomialDistribution[n, pp], k], {n, pp}, k].
>
> G
> --
> Gordon Robertson
> Canada's Michael Smith Genome Sciences Centre
> Vancouver BC Canada
>
>   

Non-default starting values are needed. By default, FindFit will use a 
starting value of 1 for each parameter, which will be problematic in 
this case.  The starting value for n should be at least as large as the 
largest binomial in the sample, and the value for pp should be strictly 
between 0 and 1. Here is an example.

In[1]:= binom = RandomInteger[BinomialDistribution[20, .4], 10]

Out[1]= {10, 7, 5, 9, 8, 12, 7, 7, 10, 9}

In[2]:= edf = Sort[Tally[binom]];

In[3]:= edf[[All, 2]] = Accumulate[edf[[All, 2]]]/Length[binom];

In[4]:= FindFit[edf,
         CDF[BinomialDistribution[n, pp], k], {{n, Max[binom] + 1}, {pp, 
.5}},
          k]

Out[4]= {n -> 17.3082, pp -> 0.4773}


Floor[n] is the value of n to take.

Note that FindFit gives a least squares fit of the edf to the cdf. 
Alternatively, a maximum likelihood estimate of the parameters can be 
obtained by maximizing the log likelihood function (the sum of the logs 
of the pdf with unknown parameters evaluated at the data points) with 
respect to the parameters.


In[5]:= loglike =
          PowerExpand[
           Total[Log[Map[PDF[BinomialDistribution[n, pp], #] &, binom]]]];


Constraints should be used to keep the parameters in the feasible range.

In[6]:= FindMaximum[{loglike, n >= Max[binom] && 0 < pp < 1}, {n,
          Max[binom] + 1}, {pp, .5}]

Out[6]= {-20.6326, {n -> 14.9309, pp -> 0.56259}}


Darren Glosemeyer
Wolfram Research


  • Prev by Date: Re: REQ: Kuratowski graphs on Möbius band and torus
  • Next by Date: Re: Unit testing in Mathematica or Wolfram Workbench
  • Previous by thread: fit a BinomialDistribution to exptl data?
  • Next by thread: Re: Re: fit a BinomialDistribution to exptl data?