Re: Cumulative probability question
- To: mathgroup at smc.vnet.net
- Subject: [mg120619] Re: Cumulative probability question
- From: Daniel Lichtblau <danl at wolfram.com>
- Date: Sun, 31 Jul 2011 07:26:14 -0400 (EDT)
- Delivered-to: l-mathgroup@mail-archive0.wolfram.com
----- Original Message -----
> From: "Peter Sisak" <p-kun80 at hotmail.com>
> To: mathgroup at smc.vnet.net
> Sent: Saturday, July 30, 2011 4:59:41 AM
> Subject: Cumulative probability question
> I have been experimenting with Mathematica in order to receive answers
> to my problem, so far without success. I have tried various forms of
> CDF
> and HypergeometricDistribution, but I seem to not know the proper
> addressing of the parameters for the problem.
>
> The problem itself is the following (with numbers given for a more
> illustrative example): Given an urn containing N(=70) items in total,
> of which n(=4) are marked, we want to draw i(=2) or more marked
> items.
>
> a) What is the formula describing the number of draws required to
> succeed in drawing at least i items with a probability of at least
> 50%?
> b) How do you make a graph of the probability (of drawing at least i
> items) graphed against the number of draws?
> c) What are the equations that need to be solved to get a numerical
> answer on a) and b)?
>
> Thank you for your assistance in advance
> P=E9ter Sis=E1k
I'll use notation n1 for the marked items, n2 for unmarked (so n1 is your n, and n1+n2 is your N, which is quite different from Mathematica's N).
Suppose you do m draws. The probability of drawing from the n1 marked elements, in the first k draws, and drawing from the n2 unmarked on the remaining m-k draws, is
Pochhammer[n1 - k + 1, k]*
Pochhammer[n2 - (m - k) + 1, m - k]/Pochhammer[n1 + n2 - m + 1, m]
If this seems like a bizarre formula, rewrite it in factorials and it should make more sense. Assuming I have it correct.
To get the probability of exactly k form the n1 marked elements, appearing in any of the m draws, multiply by Binomial[m,k]. To get the probability of at least k marked elements appearing, sum the result from k to either of m or n1 (does not matter which because all terms after the min are zero).
In[40]:= n1 = 4;
n2 = 66;
p[k_, m_, n1_, n2_] :=
Binomial[m, k]*Pochhammer[n1 - k + 1, k]*
Pochhammer[n2 - (m - k) + 1, m - k]/Pochhammer[n1 + n2 - m + 1, m]
In[43]:= prob[k_, m_, n1_, n2_] := Sum[p[j, m, n1, n2], {j, k, n1}]
Your example:
In[44]:= FindRoot[prob[2, m, 4, 66] == 1/2, {m, 20}]
Out[44]= {m -> 26.92171019255432}
So it's 27 draws you need to hit an expected 2 or more marked elements at least half the time.
One can simulate these draws as follows. Take a random sample of m elements without replacement. See how many lie among the first n1 elements. If i or more, count the sample as passing the test. Then count how many tests pass on average.
simulate[n1_, n_, k_, m_, reps_: 100] :=
Count[Table[
Count[RandomSample[Range[n], m], Alternatives @@ Range[n1]] >=
k, {reps}], True]/reps
the difference between 26 and 27 draws is noticeable. Not that I know what stats tests to use in order to quantify this.
In[61]:= Table[simulate[n1, n1 + n2, 2, 26, 100] // N, {10}]
Out[61]= {0.55, 0.41, 0.49, 0.47, 0.46, 0.44, 0.49, 0.49, 0.41, 0.49}
In[62]:= Table[simulate[n1, n1 + n2, 2, 27, 100] // N, {10}]
Out[62]= {0.59, 0.51, 0.4, 0.48, 0.48, 0.53, 0.5, 0.47, 0.55, 0.43}
Here is a neat sort of test. Run this simulation many times. Count the entire run a success if at least half the runs show a success at least half the time.
success[n1_, n2_, k_, m_, reps_] :=
Count[Table[simulate[n1, n1 + n2, k, m, reps], {reps}],
aa_ /; aa >= 1/2] >= reps/2
In[68]:= Table[success[n1, n2, 2, 26, 100], {10}]
Out[68]= {False, False, False, False, False, False, False, False, \
False, False}
In[69]:= Table[success[n1, n2, 2, 27, 100], {10}]
Out[69]= {True, True, True, True, True, False, True, True, True, True}
These results will have to speak for themselves, since I don't actually know their language.
Daniel Lichtblau
Wolfram Research