Re: Re: Re: fit a BinomialDistribution

*To*: mathgroup at smc.vnet.net*Subject*: [mg80649] Re: [mg80590] Re: [mg80473] Re: [mg80415] fit a BinomialDistribution*From*: Darren Glosemeyer <darreng at wolfram.com>*Date*: Tue, 28 Aug 2007 02:07:54 -0400 (EDT)*References*: <200708220838.EAA08485@smc.vnet.net> <23066758.1187864982434.JavaMail.root@m35> <200708260830.EAA11378@smc.vnet.net>

You are right. I thought there was a Floor[n] in the CDF expression, but there are only Floor[x]s. Either Floor[n] or Ceiling[n] could be the value to choose. The log likelihood could be evaluated at both to determine which gives the higher likelihood. Darren Glosemeyer Wolfram Research DrMajorBob wrote: >> Floor[n] is the value of n to take. >> > > Is there a simple rationale for that? > > It seems to me the optimal integer n could lie on EITHER side of > FindMaximum's or FindFit's optimal Real. > > Bobby > > On Thu, 23 Aug 2007 00:08:35 -0500, Darren Glosemeyer > <darreng at wolfram.com> wrote: > > >> Gordon Robertson wrote: >> >>> Given a list of data values, or a list of x-y data points for >>> plotting the data as an empirical distribution function, how can I >>> fit a BinomialDistribution to the data? The help documentation for >>> FindFit shows examples in which the user indicates which function >>> should be fit (e.g. FindFit[data, a x Log[b + c x], {a, b, c}, x]), >>> and I've been unable to find an example in which a statistical >>> distribution is being fit to data. Mathematica complains when I try the >>> following with an xy list of data that specified an EDF: FindFit >>> [xyvals, CDF[BinomialDistribution[n, pp], k], {n, pp}, k]. >>> >>> G >>> -- >>> Gordon Robertson >>> Canada's Michael Smith Genome Sciences Centre >>> Vancouver BC Canada >>> >>> >>> >> Non-default starting values are needed. By default, FindFit will use a >> starting value of 1 for each parameter, which will be problematic in >> this case. The starting value for n should be at least as large as the >> largest binomial in the sample, and the value for pp should be strictly >> between 0 and 1. Here is an example. >> >> In[1]:= binom = RandomInteger[BinomialDistribution[20, .4], 10] >> >> Out[1]= {10, 7, 5, 9, 8, 12, 7, 7, 10, 9} >> >> In[2]:= edf = Sort[Tally[binom]]; >> >> In[3]:= edf[[All, 2]] = Accumulate[edf[[All, 2]]]/Length[binom]; >> >> In[4]:= FindFit[edf, >> CDF[BinomialDistribution[n, pp], k], {{n, Max[binom] + 1}, {pp, >> .5}}, >> k] >> >> Out[4]= {n -> 17.3082, pp -> 0.4773} >> >> >> Floor[n] is the value of n to take. >> >> Note that FindFit gives a least squares fit of the edf to the cdf. >> Alternatively, a maximum likelihood estimate of the parameters can be >> obtained by maximizing the log likelihood function (the sum of the logs >> of the pdf with unknown parameters evaluated at the data points) with >> respect to the parameters. >> >> >> In[5]:= loglike >> PowerExpand[ >> Total[Log[Map[PDF[BinomialDistribution[n, pp], #] &, >> binom]]]]; >> >> >> Constraints should be used to keep the parameters in the feasible range. >> >> In[6]:= FindMaximum[{loglike, n >= Max[binom] && 0 < pp < 1}, {n, >> Max[binom] + 1}, {pp, .5}] >> >> Out[6]= {-20.6326, {n -> 14.9309, pp -> 0.56259}} >> >> >> Darren Glosemeyer >> Wolfram Research >> >> >> > > > >

**References**:**fit a BinomialDistribution to exptl data?***From:*Gordon Robertson <agrobertson@telus.net>

**Re: Re: fit a BinomialDistribution to exptl data?***From:*DrMajorBob <drmajorbob@bigfoot.com>