MathGroup Archive 2007

[Date Index] [Thread Index] [Author Index]

Search the Archive

Re: Re: fit a BinomialDistribution to exptl data?

  • To: mathgroup at smc.vnet.net
  • Subject: [mg80524] Re: [mg80466] Re: fit a BinomialDistribution to exptl data?
  • From: DrMajorBob <drmajorbob at bigfoot.com>
  • Date: Fri, 24 Aug 2007 02:00:59 -0400 (EDT)
  • References: <fagso0$8ad$1@smc.vnet.net> <19213635.1187856845607.JavaMail.root@m35>
  • Reply-to: drmajorbob at bigfoot.com

Fitting the distribution itself is possible, but FindFit isn't the method  
I'd use.

Here's a binomial sample:

<< Histograms`
count = 200;
sample = Sort@RandomInteger[BinomialDistribution[20, .2], count];
Histogram@sample

and here's (a version of) the empirical distribution, plotted alongside 
the source:

empirical =
   Interpolation[
    Join[{{0, 0}},
     Transpose@{Range@Length@# - 1/2, #} &@(Accumulate@
         Rest@BinCounts[sample, 1]/count), {{count, 1}}],
    InterpolationOrder -> 1];
Plot[{CDF[BinomialDistribution[20, 0.2], x], empirical@x}, {x, 0, 20}]

We can get a kind of least-squares fit this way:

Clear[error, errorSum, bestP]
error[n_Integer, p_][k_] :=
  CDF[BinomialDistribution[n, p], k] - empirical@k
errorSum[n_Integer, p_] := #.# &[error[n, p] /@ Range[0, n]]
bestP[n_] :=
  Block[{p},
   NMinimize[{errorSum[n, p], 0 < p < 1},
     p] /. {p -> z_} :> {"p" -> z, "n" -> n}]

First@Sort[bestP /@ Range[25]]
{n1, p1} = {"n", "p"} /. Last@%

{0.000322862, {"p" -> 0.184778, "n" -> 25}}

{25, 0.184778}

A far simpler method, of course, is

Through[{Mean, Variance}@sample] ==
   Through[{Mean, Variance}@BinomialDistribution[n, p]] // Thread
Solve[%, {n, p}] // N
{n2 = Round@n /. First@%, p2 = Mean@sample/n2 // N}

{79/20 == n p, 1115/398 == n (1 - p) p}

{{n -> 13.5852, p -> 0.290758}}

{14, 0.282143}

errorSum[n2, p2]

0.0425404

Here's a plot of both solutions versus "the" empirical distribution:

Show[Plot[empirical@x, {x, 0, Max[n1, n2]}],
  Plot[CDF[BinomialDistribution[n1, p1], x], {x, 0, n1},
   PlotStyle -> Directive[Red, Dashed]],
  Plot[CDF[BinomialDistribution[Round@n2, p2], x], {x, 0, n2}],
  PlotRange -> All, AspectRatio -> 1, Frame -> True]

Bobby

On Thu, 23 Aug 2007 00:04:58 -0500, dh <dh at metrohm.ch> wrote:

>
>
> Hi Gordon,
>
> FindFit will fit functions that depend on continuous parameters. n is a
>
> discrete parameter. Mathematica choose to give a step function for  
> non-integer
>
> n, however, FindFit will fail because small change in n will mostly not
>
> change the function value. Therefore, what you can do is to replace
>
> CDF[..] by a function that is not a step function in n, e.g.:
>
> myFun[n_,p_,k_]:=CDF[BinomialDistribution[Floor[n],pp],k]+(CDF[BinomialDistribution[Ceiling[n],pp],k]-CDF[BinomialDistribution[Floor[n],pp], k])(n-Floor[n])
>
> this is certainly not tuned for speed, but will make FindFit happy.
>
> hope this helps, Daniel
>
>
>
> Gordon Robertson wrote:
>
>> Given a list of data values, or a list of x-y data points for
>
>> plotting the data as an empirical distribution function, how can I
>
>> fit a BinomialDistribution to the data? The help documentation for
>
>> FindFit shows examples in which the user indicates which function
>
>> should be fit (e.g. FindFit[data, a x Log[b + c x], {a, b, c}, x]),
>
>> and I've been unable to find an example in which a statistical
>
>> distribution is being fit to data. Mathematica complains when I try the
>
>> following with an xy list of data that specified an EDF: FindFit
>
>> [xyvals, CDF[BinomialDistribution[n, pp], k], {n, pp}, k].
>
>>
>
>> G
>
>> --
>
>> Gordon Robertson
>
>> Canada's Michael Smith Genome Sciences Centre
>
>> Vancouver BC Canada
>
>>
>
>>
>
>
>
>



-- 

DrMajorBob at bigfoot.com


  • Prev by Date: Re: ListSurfacePlot3D in Mathematica Version 6
  • Next by Date: the temperamental loop or something wrong with my expression
  • Previous by thread: Re: fit a BinomialDistribution to exptl data?
  • Next by thread: Re: fit a BinomialDistribution to exptl data?