Re: Finding cardinality of set based on random selection from set

*To*: mathgroup at smc.vnet.net*Subject*: [mg75435] Re: Finding cardinality of set based on random selection from set*From*: Ray Koopman <koopman at sfu.ca>*Date*: Mon, 30 Apr 2007 03:43:04 -0400 (EDT)*References*: <26face530704242008sc104934o891814d2be51bcf1@mail.gmail.com>

On Apr 28, 2:55 am, "Kelly Jones" <kelly.terry.jo... at gmail.com> wrote: > How do I use Mathematica to solve this problem? > > I selected with replacement 100 times from a set of N objects. > The results: > > 1 item came up 4 times > 1 item came up 3 times > 10 items came up 2 times each > 73 items came up 1 time each > the other N-85 items did not come up at all > > Based on this histogram, what's the > 1) most likely value of N, and > 2) the probability distribution of N? > > Of course, I'm actually interested in the general solution: > if I select with replacement M times from a set of N objects, > and get a resulting histogram H, what's the probability > distribution of N with respect to M and H? Mathematica can do the scutwork for getting P[H|N], the conditional probability of the histogram given the set size; but P[N|H], the conditional probability of the set size given the histogram, is a Bayesian concept that requires an a priori estimate of the probability distribution of the set sizes. I'm switching all the variables to lowercase to avoid problems implementing this in Mathematica. You'll have to make up your own names for the several different probability functions "P" that are distinguished here by the symbols used for their arguments. Let s be a list in which s[[i]] = the number of times object i was selected. Then Length@s = n = set size, Total@s = m = number of draws, Total@Sign@s = k = the number of different objects selected, and the probability of s is P[s] = Multinomial@@s / n^m. Let h = Count[s,#]&/@Range@Max@s. Then h[[j]] = the number of items that came up j times, Total@h = k, h.Range@Length@h = m, and the probability of h is P[h|n] = P[s]*R[s], where R[s] = the number of rearrangements of s that give the same h. Both P[s] and R[s] can be expressed in terms of h,k,m,n, without reference to s itself, which is not observed. Multinomial@@s = Multinomial@@t, where t = Flatten@Table[j,{j,Length@h},{h[[j]]}], and R[s] = Binomial[n,k] Multinomial@@h, so we end up with P[h|n] = Multinomial@@t Binomial[n,k] Multinomial@@h / n^m. For h = {73,10,1,1}, the value of n that maximizes p[h|n] -- i.e., the maximum likelihood estimate of N -- is 296. However, the maximum likelihood estimate of N is not the "most likely value of N" that you requested. That would be the value of n that maximizes the Bayesian posterior probability P[h|n]*P[n] P[n|h] = --------------------. sum_n' P[h|n']*P[n'] To get P[n|h] you must supply a prior distribution P[n].