Re: Re: Re: fit a BinomialDistribution to exptl data?

*To*: mathgroup at smc.vnet.net*Subject*: [mg80662] Re: [mg80590] Re: [mg80473] Re: [mg80415] fit a BinomialDistribution to exptl data?*From*: DrMajorBob <drmajorbob at bigfoot.com>*Date*: Tue, 28 Aug 2007 02:14:39 -0400 (EDT)*References*: <200708220838.EAA08485@smc.vnet.net> <23066758.1187864982434.JavaMail.root@m35> <200708260830.EAA11378@smc.vnet.net> <27217752.1188270373163.JavaMail.root@m35>*Reply-to*: drmajorbob at bigfoot.com

Here's a "best integer n" code for the FindFit method: glosemeyer1b[sample_] := Block[{err1, err2, edf = Sort@Tally@sample, n, p, k, fit, nn, pp, s1, s2}, edf[[All, 2]] = Accumulate@edf[[All, 2]]/Length@sample; {nn, pp} = {n, p} /. Quiet@FindFit[edf, CDF[BinomialDistribution[n, p], k], {{n, Max@sample + 1}, {p, .5}}, k]; {s1, s2} = {Quiet@ FindFit[edf, CDF[BinomialDistribution[Floor@nn, p], k], {{p, pp}}, k, NormFunction -> ((err1 = Norm@#) &)], Quiet@FindFit[edf, CDF[BinomialDistribution[Ceiling@nn, p], k], {{p, pp}}, k, NormFunction -> ((err2 = Norm@#) &)]}; If[err1 <= err2, {Floor@nn, p /. s1}, {Ceiling@nn, p /. s2}] ] That's usually the same, though, as rounding the n returned by: glosemeyer1[sample_] := Block[{edf = Sort@Tally@sample, n, p, k}, edf[[All, 2]] = Accumulate@edf[[All, 2]]/Length@sample; {n, p} /. Quiet@FindFit[edf, CDF[BinomialDistribution[n, p], k], {{n, Max@sample + 1}, {p, .5}}, k] ] and, in any case, it's just an estimate, so the more ornate version isn't worth the trouble, I suspect. Bobby On Mon, 27 Aug 2007 09:23:24 -0500, Darren Glosemeyer <darreng at wolfram.com> wrote: > You are right. I thought there was a Floor[n] in the CDF expression, but > there are only Floor[x]s. Either Floor[n] or Ceiling[n] could be the > value to choose. The log likelihood could be evaluated at both to > determine which gives the higher likelihood. > > Darren Glosemeyer > Wolfram Research > > DrMajorBob wrote: >>> Floor[n] is the value of n to take. >>> >> >> Is there a simple rationale for that? >> >> It seems to me the optimal integer n could lie on EITHER side of >> FindMaximum's or FindFit's optimal Real. >> >> Bobby >> >> On Thu, 23 Aug 2007 00:08:35 -0500, Darren Glosemeyer >> <darreng at wolfram.com> wrote: >> >> >>> Gordon Robertson wrote: >>> >>>> Given a list of data values, or a list of x-y data points for >>>> plotting the data as an empirical distribution function, how can I >>>> fit a BinomialDistribution to the data? The help documentation for >>>> FindFit shows examples in which the user indicates which function >>>> should be fit (e.g. FindFit[data, a x Log[b + c x], {a, b, c}, x]), >>>> and I've been unable to find an example in which a statistical >>>> distribution is being fit to data. Mathematica complains when I try >>>> the >>>> following with an xy list of data that specified an EDF: FindFit >>>> [xyvals, CDF[BinomialDistribution[n, pp], k], {n, pp}, k]. >>>> >>>> G >>>> -- >>>> Gordon Robertson >>>> Canada's Michael Smith Genome Sciences Centre >>>> Vancouver BC Canada >>>> >>>> >>>> >>> Non-default starting values are needed. By default, FindFit will use a >>> starting value of 1 for each parameter, which will be problematic in >>> this case. The starting value for n should be at least as large as the >>> largest binomial in the sample, and the value for pp should be strictly >>> between 0 and 1. Here is an example. >>> >>> In[1]:= binom = RandomInteger[BinomialDistribution[20, .4], 10] >>> >>> Out[1]= {10, 7, 5, 9, 8, 12, 7, 7, 10, 9} >>> >>> In[2]:= edf = Sort[Tally[binom]]; >>> >>> In[3]:= edf[[All, 2]] = Accumulate[edf[[All, 2]]]/Length[binom];= >>> >>> In[4]:= FindFit[edf, >>> CDF[BinomialDistribution[n, pp], k], {{n, Max[binom] + 1}, = = >>> {pp, >>> .5}}, >>> k] >>> >>> Out[4]= {n -> 17.3082, pp -> 0.4773} >>> >>> >>> Floor[n] is the value of n to take. >>> >>> Note that FindFit gives a least squares fit of the edf to the cdf. >>> Alternatively, a maximum likelihood estimate of the parameters can b= e >>> obtained by maximizing the log likelihood function (the sum of the l= ogs >>> of the pdf with unknown parameters evaluated at the data points) wit= h >>> respect to the parameters. >>> >>> >>> In[5]:= loglike PowerExpand[ >>> Total[Log[Map[PDF[BinomialDistribution[n, pp], #] &, = >>> binom]]]]; >>> >>> >>> Constraints should be used to keep the parameters in the feasible = >>> range. >>> >>> In[6]:= FindMaximum[{loglike, n >= Max[binom] && 0 < pp < 1}, {n= , >>> Max[binom] + 1}, {pp, .5}] >>> >>> Out[6]= {-20.6326, {n -> 14.9309, pp -> 0.56259}} >>> >>> >>> Darren Glosemeyer >>> Wolfram Research >>> >>> >>> >> >> >> >> > > -- = DrMajorBob at bigfoot.com

**References**:**fit a BinomialDistribution to exptl data?***From:*Gordon Robertson <agrobertson@telus.net>

**Re: Re: fit a BinomialDistribution to exptl data?***From:*DrMajorBob <drmajorbob@bigfoot.com>