[Date Index]
[Thread Index]
[Author Index]
Re: Re: fit a BinomialDistribution to exptl data?
*To*: mathgroup at smc.vnet.net
*Subject*: [mg80524] Re: [mg80466] Re: fit a BinomialDistribution to exptl data?
*From*: DrMajorBob <drmajorbob at bigfoot.com>
*Date*: Fri, 24 Aug 2007 02:00:59 -0400 (EDT)
*References*: <fagso0$8ad$1@smc.vnet.net> <19213635.1187856845607.JavaMail.root@m35>
*Reply-to*: drmajorbob at bigfoot.com
Fitting the distribution itself is possible, but FindFit isn't the method
I'd use.
Here's a binomial sample:
<< Histograms`
count = 200;
sample = Sort@RandomInteger[BinomialDistribution[20, .2], count];
Histogram@sample
and here's (a version of) the empirical distribution, plotted alongside
the source:
empirical =
Interpolation[
Join[{{0, 0}},
Transpose@{Range@Length@# - 1/2, #} &@(Accumulate@
Rest@BinCounts[sample, 1]/count), {{count, 1}}],
InterpolationOrder -> 1];
Plot[{CDF[BinomialDistribution[20, 0.2], x], empirical@x}, {x, 0, 20}]
We can get a kind of least-squares fit this way:
Clear[error, errorSum, bestP]
error[n_Integer, p_][k_] :=
CDF[BinomialDistribution[n, p], k] - empirical@k
errorSum[n_Integer, p_] := #.# &[error[n, p] /@ Range[0, n]]
bestP[n_] :=
Block[{p},
NMinimize[{errorSum[n, p], 0 < p < 1},
p] /. {p -> z_} :> {"p" -> z, "n" -> n}]
First@Sort[bestP /@ Range[25]]
{n1, p1} = {"n", "p"} /. Last@%
{0.000322862, {"p" -> 0.184778, "n" -> 25}}
{25, 0.184778}
A far simpler method, of course, is
Through[{Mean, Variance}@sample] ==
Through[{Mean, Variance}@BinomialDistribution[n, p]] // Thread
Solve[%, {n, p}] // N
{n2 = Round@n /. First@%, p2 = Mean@sample/n2 // N}
{79/20 == n p, 1115/398 == n (1 - p) p}
{{n -> 13.5852, p -> 0.290758}}
{14, 0.282143}
errorSum[n2, p2]
0.0425404
Here's a plot of both solutions versus "the" empirical distribution:
Show[Plot[empirical@x, {x, 0, Max[n1, n2]}],
Plot[CDF[BinomialDistribution[n1, p1], x], {x, 0, n1},
PlotStyle -> Directive[Red, Dashed]],
Plot[CDF[BinomialDistribution[Round@n2, p2], x], {x, 0, n2}],
PlotRange -> All, AspectRatio -> 1, Frame -> True]
Bobby
On Thu, 23 Aug 2007 00:04:58 -0500, dh <dh at metrohm.ch> wrote:
>
>
> Hi Gordon,
>
> FindFit will fit functions that depend on continuous parameters. n is a
>
> discrete parameter. Mathematica choose to give a step function for
> non-integer
>
> n, however, FindFit will fail because small change in n will mostly not
>
> change the function value. Therefore, what you can do is to replace
>
> CDF[..] by a function that is not a step function in n, e.g.:
>
> myFun[n_,p_,k_]:=CDF[BinomialDistribution[Floor[n],pp],k]+(CDF[BinomialDistribution[Ceiling[n],pp],k]-CDF[BinomialDistribution[Floor[n],pp], k])(n-Floor[n])
>
> this is certainly not tuned for speed, but will make FindFit happy.
>
> hope this helps, Daniel
>
>
>
> Gordon Robertson wrote:
>
>> Given a list of data values, or a list of x-y data points for
>
>> plotting the data as an empirical distribution function, how can I
>
>> fit a BinomialDistribution to the data? The help documentation for
>
>> FindFit shows examples in which the user indicates which function
>
>> should be fit (e.g. FindFit[data, a x Log[b + c x], {a, b, c}, x]),
>
>> and I've been unable to find an example in which a statistical
>
>> distribution is being fit to data. Mathematica complains when I try the
>
>> following with an xy list of data that specified an EDF: FindFit
>
>> [xyvals, CDF[BinomialDistribution[n, pp], k], {n, pp}, k].
>
>>
>
>> G
>
>> --
>
>> Gordon Robertson
>
>> Canada's Michael Smith Genome Sciences Centre
>
>> Vancouver BC Canada
>
>>
>
>>
>
>
>
>
--
DrMajorBob at bigfoot.com
Prev by Date:
**Re: ListSurfacePlot3D in Mathematica Version 6**
Next by Date:
**the temperamental loop or something wrong with my expression**
Previous by thread:
**Re: fit a BinomialDistribution to exptl data?**
Next by thread:
**Re: fit a BinomialDistribution to exptl data?**
| |