Re: Re: fit a BinomialDistribution to exptl data?

*To*: mathgroup at smc.vnet.net*Subject*: [mg80969] Re: [mg80915] Re: fit a BinomialDistribution to exptl data?*From*: DrMajorBob <drmajorbob at bigfoot.com>*Date*: Thu, 6 Sep 2007 05:35:39 -0400 (EDT)*References*: <11420600.1188458042685.JavaMail.root@m35> <6526955.1189001025956.JavaMail.root@m35>*Reply-to*: drmajorbob at bigfoot.com

Thanks for that! Very instructive. I'd only suggest that nmax = SortBy[Transpose[{Range[300], nposteriorList}], Last][[-1, 1]] is easier accomplished by nmax = First@Ordering[nposteriorList, -1] Bobby On Wed, 05 Sep 2007 01:58:09 -0500, Mark Fisher <particlefilter at gmail.com> wrote: > On Sep 1, 12:43 am, Ray Koopman <koop... at sfu.ca> wrote: >> Also, it is possible that >> the ML estimate of n may not exist because the likelihood, maximized >> w.r.t. p given n, may be monotone increasing in n, as is the case >> with the following 200 samples from a Binomial[20,.2] distribution: >> >> {x = {0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11}, >> f = {3, 21, 28, 37, 43, 28, 22, 8, 6, 2, 1, 1}}; >> >> which has {mean, variance} = {3.875, 4.02952}. It is not obvious >> (except perhaps to a Bayesian) what should be done in cases such as >> this. > > It seems this thread, which began as a discussion of fitting a > functional forms, has evolved into a discussion of statistical > inference. (The two subjects are related, but distinct.) My post is > intended to contribute to the statistical inference discussion by > providing a Bayesian perspective. (The code is written using Version > 6.) > > The Bayesian approach produces probability statements about the > "unknowns" given the the data. In this context, the data are those > given by Ray Koopman and the unknowns are n and p, the parameters of > the binomial distribution. In order to make probability statements > about about the joint distribution for (n,p) given the data, the rules > of probability require us to specify a prior distribution that > characterizes what we knew before we saw the data. > > Some notation will be helpful. Let f(n,p) denote the joint prior > distribution, let f(n,p|D) denote the joint posterior distribution > given the data D, and let f(D|n,p) denote the sampling distribution of > the data given the parameters. (In a Bayesian analysis, the sampling > distribution becomes the "likelihood" of the parameters given the > data.) Bayes' rule says f(n,p|D) is proportional to g(n,p) := f(n,p) > * f(D|n,p). The typical procedure is to generate draws from f(n,p|D) > using MCMC techniques (Markov Chain Monte Carlo). > > In this example, I take the following route instead. First, factor the > prior into marginal and conditional: f(n,p) = f(n) * f(p|n). Second, > integrate p out of both sides of Bayes' rule, so that f(n|D) is > proportional to g(n) := f(D|n) * f(n). Finally, generate draws from > the joint posterior by drawing successively from f(n|D) and then from > f(p|n,D), the latter of which is proportional to g(p) := f(D|n,p) * > f(p|n). > > For this example, I characterize the prior knowledge about n and p as > follows: (1) n and p are independent, so that f(n,p) = f(n) * f(p), > (2) all values of p are equally likely, so that f(p) = 1 for 0 <= p <= > 1, and (3) the probability of n is given by f(n) = .03*(1-.03)^(n-1) . > (I'll comment on these choices at the end.) > > pprior[p_] = 1 > nprior[n_] = .03*(1-.03)^(n-1) > > The data are those given by Ray Koopman: > > data0 = Transpose[{ > {0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11}, > {3, 21, 28, 37, 43, 28, 22, 8, 6, 2, 1, 1}}]; > data = Flatten[ConstantArray @@@ data0]; > > The likelihood: > > jointlikelihood[n_, p_] = > Times @@ (PDF[BinomialDistribution[n, p], #] & /@ data); > > An aside: Ray Koopman's comment about the absence of a maximum refers > to h(n) := max_p f(D|n,p), which is increasing in n. This is > illustrated by > > ListLinePlot[Table[ > FindMaximum[Log[jointlikelihood[n, p]], {p, .05, .06}][[1]], > {n, 11, 100}], DataRange -> {11, 100}] > > Now compute f(D|n) by *integrating* p out (rather than *maximizing* p > out). (Although I specify the assumption n >= 11, the resulting > expression is valid for n < 11 as well: the likelihood of n for n < 11 > is zero.) Even ignoring the prior for n (which downweights large > values of n), the marginal likelihood for n has a maximum (at n=53): > > nlikelihood[n_] = Integrate[jointlikelihood[n, p] * pprior[p], > {p, 0, 1}, Assumptions -> n >= 11]; > > Compute f(n|D) from f(D|n) * f(n). (I have chosen the upper bound of > 300 for convenience; the prior probability of n > 300 is about > 10^(-4).) From f(n|D), i.e., nposteriorList, the most likely value for > n is 36 (nmax) and, given n=nmax, the maximum value for p is about > 0.11 (pmax): > > nposteriorList = #/Total[#]& @ Table[nlikelihood[n] nprior[n], {n, 1, > 300}]; > ListLinePlot[nposteriorList, PlotRange -> All] > nmax = SortBy[Transpose[{Range[300], nposteriorList}], Last][[-1, 1]] > pmax = p /. Last@FindMaximum[Log[jointlikelihood[nmax, p]], {p, .1, . > 15}] > > Now compute f(n,p|D) by drawing from f(n|D) and f(p|n,D). By examining > the f(D|n,p), one can determine that f(p|n,D) = > PDF[BetaDistribution[776, 200 n - 774], p]. We see that n and p are > highly negatively correlated and the the most likely value for p is > noticably less than pmax: > > draws = Table[With[{ > n = RandomChoice[nposteriorList[[11 ;;]] -> Range[11,300]]}, > {n, RandomReal[BetaDistribution[776, 200 n - 774]]}], {10^5}]; > bc = BinCounts[draws, {.5, 300.5, 1}, {0, .35, .005}]; > ListDensityPlot[Transpose[bc], > ColorFunction -> (GrayLevel[1 - #^(.5)] &), > DataRange -> {{1, 300}, {0, .35}}, PlotRange -> All] > ListLinePlot[Total[bc, {2}], DataRange -> {1, 300}, > GridLines -> {{nmax}, {}}, PlotRange -> All] > ListLinePlot[Total[bc], DataRange -> {0, .35}, > GridLines -> {{pmax}, {}}] > > Now that we have the joint posterior distribution, we can compute the > posterior distribution of any function of n and p. For example the > mean of the binomial distribution, n*p, which is tightly concentrated > around mean of the data (3.875): > > Through[{Mean, StandardDeviation}[Times @@@ draws]] > > We can also compute the uncertainty associated with the binomial CDF > (which connects us up with fitting functional forms): > > Needs["ErrorBarPlots`"] > meansd = (Through[{Mean, StandardDeviation}[#]] & /@ > Transpose[Function[{n, p}, > Table[CDF[BinomialDistribution[n, p], x], {x, 0, 15}] // Evaluate] > @@@ > Take[draws, 1000]]); > ErrorListPlot[Transpose[{ > Transpose[{Range[0, 15], meansd[[All, 1]]}], ErrorBar[2 #] & /@ > meansd[[All, 2]]}]] > > In Bayesian analysis, one uses "decision theory" to get a "point > estimate". A point estimate minimizes the expected loss computed with > respect to the posterior distribution, where the "loss function" > characterizes the cost of choosing the point (n1,p1) when in fact > (n0,p0) is true. For example, quadratic loss leads to choosing the > posterior mean, while a "zero-one" loss function leads to choosing the > maximum posterior value. > > The prior I chose above is intended to illustrate the analysis. In a > real-world situation with real-world data, one should incorporate what > one knows into the prior. For example, one might know from experience > that the probability that p is greater than 1/2 is quite small; or one= > might know that n can't be larger than 35. Ultimately, if the thing > you care about is sensitive to the prior, then you should think > carefully about the appropriate prior. > > --Mark > > > -- = DrMajorBob at bigfoot.com