MathGroup Archive: August 2007 [01036]

[Date Index] [Thread Index] [Author Index]

Re: Re: Re: fit a BinomialDistribution

To: mathgroup at smc.vnet.net
Subject: [mg80649] Re: [mg80590] Re: [mg80473] Re: [mg80415] fit a BinomialDistribution
From: Darren Glosemeyer <darreng at wolfram.com>
Date: Tue, 28 Aug 2007 02:07:54 -0400 (EDT)
References: <200708220838.EAA08485@smc.vnet.net> <23066758.1187864982434.JavaMail.root@m35> <200708260830.EAA11378@smc.vnet.net>

You are right. I thought there was a Floor[n] in the CDF expression, but 
there are only Floor[x]s. Either Floor[n] or Ceiling[n] could be the 
value to choose. The log likelihood could be evaluated at both to 
determine which gives the higher likelihood.

Darren Glosemeyer
Wolfram Research

DrMajorBob wrote:
>> Floor[n] is the value of n to take.
>>     
>
> Is there a simple rationale for that?
>
> It seems to me the optimal integer n could lie on EITHER side of  
> FindMaximum's or FindFit's optimal Real.
>
> Bobby
>
> On Thu, 23 Aug 2007 00:08:35 -0500, Darren Glosemeyer  
> <darreng at wolfram.com> wrote:
>
>   
>> Gordon Robertson wrote:
>>     
>>> Given a list of data values, or a list of x-y data points for
>>> plotting the data as an empirical distribution function, how can I
>>> fit a BinomialDistribution to the data? The help documentation for
>>> FindFit shows examples in which the user indicates which function
>>> should be fit (e.g. FindFit[data, a x Log[b + c x], {a, b, c}, x]),
>>> and I've been unable to find an example in which a statistical
>>> distribution is being fit to data. Mathematica complains when I try the
>>> following with an xy list of data that specified an EDF: FindFit
>>> [xyvals, CDF[BinomialDistribution[n, pp], k], {n, pp}, k].
>>>
>>> G
>>> --
>>> Gordon Robertson
>>> Canada's Michael Smith Genome Sciences Centre
>>> Vancouver BC Canada
>>>
>>>
>>>       
>> Non-default starting values are needed. By default, FindFit will use a
>> starting value of 1 for each parameter, which will be problematic in
>> this case.  The starting value for n should be at least as large as the
>> largest binomial in the sample, and the value for pp should be strictly
>> between 0 and 1. Here is an example.
>>
>> In[1]:= binom = RandomInteger[BinomialDistribution[20, .4], 10]
>>
>> Out[1]= {10, 7, 5, 9, 8, 12, 7, 7, 10, 9}
>>
>> In[2]:= edf = Sort[Tally[binom]];
>>
>> In[3]:= edf[[All, 2]] = Accumulate[edf[[All, 2]]]/Length[binom];
>>
>> In[4]:= FindFit[edf,
>>          CDF[BinomialDistribution[n, pp], k], {{n, Max[binom] + 1}, {pp,
>> .5}},
>>           k]
>>
>> Out[4]= {n -> 17.3082, pp -> 0.4773}
>>
>>
>> Floor[n] is the value of n to take.
>>
>> Note that FindFit gives a least squares fit of the edf to the cdf.
>> Alternatively, a maximum likelihood estimate of the parameters can be
>> obtained by maximizing the log likelihood function (the sum of the logs
>> of the pdf with unknown parameters evaluated at the data points) with
>> respect to the parameters.
>>
>>
>> In[5]:= loglike 
>>           PowerExpand[
>>            Total[Log[Map[PDF[BinomialDistribution[n, pp], #] &,  
>> binom]]]];
>>
>>
>> Constraints should be used to keep the parameters in the feasible range.
>>
>> In[6]:= FindMaximum[{loglike, n >= Max[binom] && 0 < pp < 1}, {n,
>>           Max[binom] + 1}, {pp, .5}]
>>
>> Out[6]= {-20.6326, {n -> 14.9309, pp -> 0.56259}}
>>
>>
>> Darren Glosemeyer
>> Wolfram Research
>>
>>
>>     
>
>
>
>

References:
- fit a BinomialDistribution to exptl data?
  - From: Gordon Robertson <agrobertson@telus.net>
- Re: Re: fit a BinomialDistribution to exptl data?
  - From: DrMajorBob <drmajorbob@bigfoot.com>

Prev by Date: Re: Mathematica 6.0.1, Crashes when I hit the ENTER key !!

Next by Date: Re: Manipulate a VectorFieldPlot3D

Previous by thread: Re: Re: fit a BinomialDistribution to exptl data?

Next by thread: Re: Re: Re: fit a BinomialDistribution to exptl data?