Re: Re: fit a BinomialDistribution to exptl data?

*To*: mathgroup at smc.vnet.net*Subject*: [mg80524] Re: [mg80466] Re: fit a BinomialDistribution to exptl data?*From*: DrMajorBob <drmajorbob at bigfoot.com>*Date*: Fri, 24 Aug 2007 02:00:59 -0400 (EDT)*References*: <fagso0$8ad$1@smc.vnet.net> <19213635.1187856845607.JavaMail.root@m35>*Reply-to*: drmajorbob at bigfoot.com

Fitting the distribution itself is possible, but FindFit isn't the method I'd use. Here's a binomial sample: << Histograms` count = 200; sample = Sort@RandomInteger[BinomialDistribution[20, .2], count]; Histogram@sample and here's (a version of) the empirical distribution, plotted alongside the source: empirical = Interpolation[ Join[{{0, 0}}, Transpose@{Range@Length@# - 1/2, #} &@(Accumulate@ Rest@BinCounts[sample, 1]/count), {{count, 1}}], InterpolationOrder -> 1]; Plot[{CDF[BinomialDistribution[20, 0.2], x], empirical@x}, {x, 0, 20}] We can get a kind of least-squares fit this way: Clear[error, errorSum, bestP] error[n_Integer, p_][k_] := CDF[BinomialDistribution[n, p], k] - empirical@k errorSum[n_Integer, p_] := #.# &[error[n, p] /@ Range[0, n]] bestP[n_] := Block[{p}, NMinimize[{errorSum[n, p], 0 < p < 1}, p] /. {p -> z_} :> {"p" -> z, "n" -> n}] First@Sort[bestP /@ Range[25]] {n1, p1} = {"n", "p"} /. Last@% {0.000322862, {"p" -> 0.184778, "n" -> 25}} {25, 0.184778} A far simpler method, of course, is Through[{Mean, Variance}@sample] == Through[{Mean, Variance}@BinomialDistribution[n, p]] // Thread Solve[%, {n, p}] // N {n2 = Round@n /. First@%, p2 = Mean@sample/n2 // N} {79/20 == n p, 1115/398 == n (1 - p) p} {{n -> 13.5852, p -> 0.290758}} {14, 0.282143} errorSum[n2, p2] 0.0425404 Here's a plot of both solutions versus "the" empirical distribution: Show[Plot[empirical@x, {x, 0, Max[n1, n2]}], Plot[CDF[BinomialDistribution[n1, p1], x], {x, 0, n1}, PlotStyle -> Directive[Red, Dashed]], Plot[CDF[BinomialDistribution[Round@n2, p2], x], {x, 0, n2}], PlotRange -> All, AspectRatio -> 1, Frame -> True] Bobby On Thu, 23 Aug 2007 00:04:58 -0500, dh <dh at metrohm.ch> wrote: > > > Hi Gordon, > > FindFit will fit functions that depend on continuous parameters. n is a > > discrete parameter. Mathematica choose to give a step function for > non-integer > > n, however, FindFit will fail because small change in n will mostly not > > change the function value. Therefore, what you can do is to replace > > CDF[..] by a function that is not a step function in n, e.g.: > > myFun[n_,p_,k_]:=CDF[BinomialDistribution[Floor[n],pp],k]+(CDF[BinomialDistribution[Ceiling[n],pp],k]-CDF[BinomialDistribution[Floor[n],pp], k])(n-Floor[n]) > > this is certainly not tuned for speed, but will make FindFit happy. > > hope this helps, Daniel > > > > Gordon Robertson wrote: > >> Given a list of data values, or a list of x-y data points for > >> plotting the data as an empirical distribution function, how can I > >> fit a BinomialDistribution to the data? The help documentation for > >> FindFit shows examples in which the user indicates which function > >> should be fit (e.g. FindFit[data, a x Log[b + c x], {a, b, c}, x]), > >> and I've been unable to find an example in which a statistical > >> distribution is being fit to data. Mathematica complains when I try the > >> following with an xy list of data that specified an EDF: FindFit > >> [xyvals, CDF[BinomialDistribution[n, pp], k], {n, pp}, k]. > >> > >> G > >> -- > >> Gordon Robertson > >> Canada's Michael Smith Genome Sciences Centre > >> Vancouver BC Canada > >> > >> > > > > -- DrMajorBob at bigfoot.com