MathGroup Archive: August 2007 [01040]

[Date Index] [Thread Index] [Author Index]

Re: Re: Re: fit a BinomialDistribution to exptl data?

To: mathgroup at smc.vnet.net
Subject: [mg80662] Re: [mg80590] Re: [mg80473] Re: [mg80415] fit a BinomialDistribution to exptl data?
From: DrMajorBob <drmajorbob at bigfoot.com>
Date: Tue, 28 Aug 2007 02:14:39 -0400 (EDT)
References: <200708220838.EAA08485@smc.vnet.net> <23066758.1187864982434.JavaMail.root@m35> <200708260830.EAA11378@smc.vnet.net> <27217752.1188270373163.JavaMail.root@m35>
Reply-to: drmajorbob at bigfoot.com

Here's a "best integer n" code for the FindFit method:

glosemeyer1b[sample_] :=
  Block[{err1, err2, edf = Sort@Tally@sample, n, p, k, fit, nn, pp, s1,
     s2},
   edf[[All, 2]] = Accumulate@edf[[All, 2]]/Length@sample;
   {nn, pp} = {n, p} /.
     Quiet@FindFit[edf,
       CDF[BinomialDistribution[n, p],
        k], {{n, Max@sample + 1}, {p, .5}}, k];
   {s1, s2} = {Quiet@
      FindFit[edf,
       CDF[BinomialDistribution[Floor@nn, p], k], {{p, pp}}, k,
       NormFunction -> ((err1 = Norm@#) &)],
     Quiet@FindFit[edf,
       CDF[BinomialDistribution[Ceiling@nn, p], k], {{p, pp}}, k,
       NormFunction -> ((err2 = Norm@#) &)]};
   If[err1 <= err2, {Floor@nn, p /. s1}, {Ceiling@nn, p /. s2}]
   ]

That's usually the same, though, as rounding the n returned by:

glosemeyer1[sample_] := Block[{edf = Sort@Tally@sample, n, p, k},
   edf[[All, 2]] = Accumulate@edf[[All, 2]]/Length@sample;
   {n, p} /.
    Quiet@FindFit[edf,
      CDF[BinomialDistribution[n, p],
       k], {{n, Max@sample + 1}, {p, .5}}, k]
   ]

and, in any case, it's just an estimate, so the more ornate version isn't  
worth the trouble, I suspect.

Bobby

On Mon, 27 Aug 2007 09:23:24 -0500, Darren Glosemeyer 
<darreng at wolfram.com> wrote:

> You are right. I thought there was a Floor[n] in the CDF expression, but  
> there are only Floor[x]s. Either Floor[n] or Ceiling[n] could be the  
> value to choose. The log likelihood could be evaluated at both to  
> determine which gives the higher likelihood.
>
> Darren Glosemeyer
> Wolfram Research
>
> DrMajorBob wrote:
>>> Floor[n] is the value of n to take.
>>>
>>
>> Is there a simple rationale for that?
>>
>> It seems to me the optimal integer n could lie on EITHER side of   
>> FindMaximum's or FindFit's optimal Real.
>>
>> Bobby
>>
>> On Thu, 23 Aug 2007 00:08:35 -0500, Darren Glosemeyer   
>> <darreng at wolfram.com> wrote:
>>
>>
>>> Gordon Robertson wrote:
>>>
>>>> Given a list of data values, or a list of x-y data points for
>>>> plotting the data as an empirical distribution function, how can I
>>>> fit a BinomialDistribution to the data? The help documentation for
>>>> FindFit shows examples in which the user indicates which function
>>>> should be fit (e.g. FindFit[data, a x Log[b + c x], {a, b, c}, x]),
>>>> and I've been unable to find an example in which a statistical
>>>> distribution is being fit to data. Mathematica complains when I try 
>>>> the
>>>> following with an xy list of data that specified an EDF: FindFit
>>>> [xyvals, CDF[BinomialDistribution[n, pp], k], {n, pp}, k].
>>>>
>>>> G
>>>> --
>>>> Gordon Robertson
>>>> Canada's Michael Smith Genome Sciences Centre
>>>> Vancouver BC Canada
>>>>
>>>>
>>>>
>>> Non-default starting values are needed. By default, FindFit will use a
>>> starting value of 1 for each parameter, which will be problematic in
>>> this case.  The starting value for n should be at least as large as the
>>> largest binomial in the sample, and the value for pp should be strictly
>>> between 0 and 1. Here is an example.
>>>
>>> In[1]:= binom = RandomInteger[BinomialDistribution[20, .4], 10]
>>>
>>> Out[1]= {10, 7, 5, 9, 8, 12, 7, 7, 10, 9}
>>>
>>> In[2]:= edf = Sort[Tally[binom]];
>>>
>>> In[3]:= edf[[All, 2]] = Accumulate[edf[[All, 2]]]/Length[binom];=

>>>
>>> In[4]:= FindFit[edf,
>>>          CDF[BinomialDistribution[n, pp], k], {{n, Max[binom] + 1}, =
 =

>>> {pp,
>>> .5}},
>>>           k]
>>>
>>> Out[4]= {n -> 17.3082, pp -> 0.4773}
>>>
>>>
>>> Floor[n] is the value of n to take.
>>>
>>> Note that FindFit gives a least squares fit of the edf to the cdf.
>>> Alternatively, a maximum likelihood estimate of the parameters can b=
e
>>> obtained by maximizing the log likelihood function (the sum of the l=
ogs
>>> of the pdf with unknown parameters evaluated at the data points) wit=
h
>>> respect to the parameters.
>>>
>>>
>>> In[5]:= loglike           PowerExpand[
>>>            Total[Log[Map[PDF[BinomialDistribution[n, pp], #] &,   =

>>> binom]]]];
>>>
>>>
>>> Constraints should be used to keep the parameters in the feasible  =

>>> range.
>>>
>>> In[6]:= FindMaximum[{loglike, n >= Max[binom] && 0 < pp < 1}, {n=
,
>>>           Max[binom] + 1}, {pp, .5}]
>>>
>>> Out[6]= {-20.6326, {n -> 14.9309, pp -> 0.56259}}
>>>
>>>
>>> Darren Glosemeyer
>>> Wolfram Research
>>>
>>>
>>>
>>
>>
>>
>>
>
>



-- =

DrMajorBob at bigfoot.com

References:
- fit a BinomialDistribution to exptl data?
  - From: Gordon Robertson <agrobertson@telus.net>
- Re: Re: fit a BinomialDistribution to exptl data?
  - From: DrMajorBob <drmajorbob@bigfoot.com>

Prev by Date: Re: Re: Re: fit a BinomialDistribution to exptl data?

Next by Date: Re: Re: record intermediate steps

Previous by thread: Re: Re: Re: fit a BinomialDistribution to exptl data?

Next by thread: Re: fit a BinomialDistribution to exptl data?