MathGroup Archive: September 2007 [00148]

[Date Index] [Thread Index] [Author Index]
Re: fit a BinomialDistribution to exptl data?
To: mathgroup at smc.vnet.net
Subject: [mg80915] Re: fit a BinomialDistribution to exptl data?
From: Mark Fisher <particlefilter at gmail.com>
Date: Wed, 5 Sep 2007 02:58:09 -0400 (EDT)
References: <11420600.1188458042685.JavaMail.root@m35>
On Sep 1, 12:43 am, Ray Koopman <koop... at sfu.ca> wrote:
> Also, it is possible that
> the ML estimate of n may not exist because the likelihood, maximized
> w.r.t. p given n, may be monotone increasing in n, as is the case
> with the following 200 samples from a Binomial[20,.2] distribution:
>
> {x = {0,  1,  2,  3,  4,  5,  6, 7, 8, 9, 10, 11},
>  f = {3, 21, 28, 37, 43, 28, 22, 8, 6, 2,  1,  1}};
>
> which has {mean, variance} = {3.875, 4.02952}. It is not obvious
> (except perhaps to a Bayesian) what should be done in cases such as
> this.

It seems this thread, which began as a discussion of fitting a
functional forms, has evolved into a discussion of statistical
inference. (The two subjects are related, but distinct.) My post is
intended to contribute to the statistical inference discussion by
providing a Bayesian perspective. (The code is written using Version
6.)

The Bayesian approach produces probability statements about the
"unknowns" given the the data. In this context, the data are those
given by Ray Koopman and the unknowns are n and p, the parameters of
the binomial distribution. In order to make probability statements
about about the joint distribution for (n,p) given the data, the rules
of probability require us to specify a prior distribution that
characterizes what we knew before we saw the data.

Some notation will be helpful. Let f(n,p) denote the joint prior
distribution, let f(n,p|D) denote the joint posterior distribution
given the data D, and let f(D|n,p) denote the sampling distribution of
the data given the parameters. (In a Bayesian analysis, the sampling
distribution becomes the "likelihood" of the parameters given the
data.) Bayes' rule says f(n,p|D) is proportional to g(n,p) :=  f(n,p)
* f(D|n,p). The typical procedure is to generate draws from f(n,p|D)
using MCMC techniques (Markov Chain Monte Carlo).

In this example, I take the following route instead. First, factor the
prior into marginal and conditional: f(n,p) = f(n) * f(p|n). Second,
integrate p out of both sides of Bayes' rule, so that f(n|D) is
proportional to g(n) := f(D|n) * f(n). Finally, generate draws from
the joint posterior by drawing successively from f(n|D) and then from
f(p|n,D), the latter of which is proportional to g(p) := f(D|n,p) *
f(p|n).

For this example, I characterize the prior knowledge about n and p as
follows: (1) n and p are independent, so that f(n,p) = f(n) * f(p),
(2) all values of p are equally likely, so that f(p) = 1 for 0 <= p <=
1, and (3) the probability of n is given by f(n) = .03*(1-.03)^(n-1).
(I'll comment on these choices at the end.)

pprior[p_] = 1
nprior[n_] = .03*(1-.03)^(n-1)

The data are those given by Ray Koopman:

data0 = Transpose[{
	{0,  1,  2,  3,  4,  5,  6, 7, 8, 9, 10, 11},
	{3, 21, 28, 37, 43, 28, 22, 8, 6, 2,  1,  1}}];
data = Flatten[ConstantArray @@@ data0];

The likelihood:

jointlikelihood[n_, p_] =
  Times @@ (PDF[BinomialDistribution[n, p], #] & /@ data);

An aside: Ray Koopman's comment about the absence of a maximum refers
to h(n) := max_p f(D|n,p), which is increasing in n. This is
illustrated by

ListLinePlot[Table[
	FindMaximum[Log[jointlikelihood[n, p]], {p, .05, .06}][[1]],
	{n, 11, 100}], DataRange -> {11, 100}]

Now compute f(D|n) by *integrating* p out (rather than *maximizing* p
out). (Although I specify the assumption n >= 11, the resulting
expression is valid for n < 11 as well: the likelihood of n for n < 11
is zero.) Even ignoring the prior for n (which downweights large
values of n), the marginal likelihood for n has a maximum (at n=53):

nlikelihood[n_] = Integrate[jointlikelihood[n, p] * pprior[p],
	{p, 0, 1}, Assumptions -> n >= 11];

Compute f(n|D) from f(D|n) * f(n). (I have chosen the upper bound of
300 for convenience; the prior probability of n > 300 is about
10^(-4).) From f(n|D), i.e., nposteriorList, the most likely value for
n is 36 (nmax) and, given n=nmax, the maximum value for p is about
0.11 (pmax):

nposteriorList = #/Total[#]& @ Table[nlikelihood[n] nprior[n], {n, 1,
300}];
ListLinePlot[nposteriorList, PlotRange -> All]
nmax = SortBy[Transpose[{Range[300], nposteriorList}], Last][[-1, 1]]
pmax = p /. Last@FindMaximum[Log[jointlikelihood[nmax, p]], {p, .1, .
15}]

Now compute f(n,p|D) by drawing from f(n|D) and f(p|n,D). By examining
the f(D|n,p), one can determine that f(p|n,D) =
PDF[BetaDistribution[776, 200 n - 774], p]. We see that n and p are
highly negatively correlated and the the most likely value for p is
noticably less than pmax:

draws = Table[With[{
	n = RandomChoice[nposteriorList[[11 ;;]] -> Range[11,300]]},
	{n, RandomReal[BetaDistribution[776, 200 n - 774]]}], {10^5}];
bc = BinCounts[draws, {.5, 300.5, 1}, {0, .35, .005}];
ListDensityPlot[Transpose[bc],
	ColorFunction -> (GrayLevel[1 - #^(.5)] &),
	DataRange -> {{1, 300}, {0, .35}}, PlotRange -> All]
ListLinePlot[Total[bc, {2}], DataRange -> {1, 300},
	GridLines -> {{nmax}, {}}, PlotRange -> All]
ListLinePlot[Total[bc], DataRange -> {0, .35},
	GridLines -> {{pmax}, {}}]

Now that we have the joint posterior distribution, we can compute the
posterior distribution of any function of n and p. For example the
mean of the binomial distribution, n*p, which is tightly concentrated
around mean of the data (3.875):

Through[{Mean, StandardDeviation}[Times @@@ draws]]

We can also compute the uncertainty associated with the binomial CDF
(which connects us up with fitting functional forms):

Needs["ErrorBarPlots`"]
meansd = (Through[{Mean, StandardDeviation}[#]] & /@
	Transpose[Function[{n, p},
	Table[CDF[BinomialDistribution[n, p], x], {x, 0, 15}] // Evaluate]
@@@
	Take[draws, 1000]]);
ErrorListPlot[Transpose[{
	Transpose[{Range[0, 15], meansd[[All, 1]]}], ErrorBar[2 #] & /@
	meansd[[All, 2]]}]]

In Bayesian analysis, one uses "decision theory" to get a "point
estimate". A point estimate minimizes the expected loss computed with
respect to the posterior distribution, where the "loss function"
characterizes the cost of choosing the point (n1,p1) when in fact
(n0,p0) is true. For example, quadratic loss leads to choosing the
posterior mean, while a "zero-one" loss function leads to choosing the
maximum posterior value.

The prior I chose above is intended to illustrate the analysis. In a
real-world situation with real-world data, one should incorporate what
one knows into the prior. For example, one might know from experience
that the probability that p is greater than 1/2 is quite small; or one
might know that n can't be larger than 35. Ultimately, if the thing
you care about is sensitive to the prior, then you should think
carefully about the appropriate prior.

--Mark
Prev by Date: Slow Show/Graphics in v6.0
Next by Date: Using Inset as a primitive in Graphics3D
Previous by thread: Re: fit a BinomialDistribution to exptl data?
Next by thread: Re: Re: fit a BinomialDistribution to exptl data?