MathGroup Archive: July 2011 [00677]

[Date Index] [Thread Index] [Author Index]

Re: Cumulative probability question

To: mathgroup at smc.vnet.net
Subject: [mg120619] Re: Cumulative probability question
From: Daniel Lichtblau <danl at wolfram.com>
Date: Sun, 31 Jul 2011 07:26:14 -0400 (EDT)
Delivered-to: l-mathgroup@mail-archive0.wolfram.com

----- Original Message -----
> From: "Peter Sisak" <p-kun80 at hotmail.com>
> To: mathgroup at smc.vnet.net
> Sent: Saturday, July 30, 2011 4:59:41 AM
> Subject: Cumulative probability question
> I have been experimenting with Mathematica in order to receive answers
> to my problem, so far without success. I have tried various forms of
> CDF
> and HypergeometricDistribution, but I seem to not know the proper
> addressing of the parameters for the problem.
> 
> The problem itself is the following (with numbers given for a more
> illustrative example): Given an urn containing N(=70) items in total,
> of which n(=4) are marked, we want to draw i(=2) or more marked
> items.
> 
> a) What is the formula describing the number of draws required to
> succeed in drawing at least i items with a probability of at least
> 50%?
> b) How do you make a graph of the probability (of drawing at least i
> items) graphed against the number of draws?
> c) What are the equations that need to be solved to get a numerical
> answer on a) and b)?
> 
> Thank you for your assistance in advance
> P=E9ter Sis=E1k

I'll use notation n1 for the marked items, n2 for unmarked (so n1 is your n, and n1+n2 is your N, which is quite different from Mathematica's N).

Suppose you do m draws. The probability of drawing from the n1 marked elements, in the first k draws, and drawing from the n2 unmarked on the remaining m-k draws, is

Pochhammer[n1 - k + 1, k]*
 Pochhammer[n2 - (m - k) + 1, m - k]/Pochhammer[n1 + n2 - m + 1, m]

If this seems like a bizarre formula, rewrite it in factorials and it should make more sense. Assuming I have it correct.

To get the probability of exactly k form the n1 marked elements, appearing in any of the m draws, multiply by Binomial[m,k]. To get the probability of at least k marked elements appearing, sum the result from k to either of m or n1 (does not matter which because all terms after the min are zero).

In[40]:= n1 = 4;
n2 = 66;
p[k_, m_, n1_, n2_] := 
 Binomial[m, k]*Pochhammer[n1 - k + 1, k]*
  Pochhammer[n2 - (m - k) + 1, m - k]/Pochhammer[n1 + n2 - m + 1, m]

In[43]:= prob[k_, m_, n1_, n2_] := Sum[p[j, m, n1, n2], {j, k, n1}]

Your example:

In[44]:= FindRoot[prob[2, m, 4, 66] == 1/2, {m, 20}]
Out[44]= {m -> 26.92171019255432}

So it's 27 draws you need to hit an expected 2 or more marked elements at least half the time.

One can simulate these draws as follows. Take a random sample of m elements without replacement. See how many lie among the first n1 elements. If i or more, count the sample as passing the test. Then count how many tests pass on average.

simulate[n1_, n_, k_, m_, reps_: 100] := 
 Count[Table[
    Count[RandomSample[Range[n], m], Alternatives @@ Range[n1]] >= 
     k, {reps}], True]/reps

the difference between 26 and 27 draws is noticeable. Not that I know what stats tests to use in order to quantify this.

In[61]:= Table[simulate[n1, n1 + n2, 2, 26, 100] // N, {10}]
Out[61]= {0.55, 0.41, 0.49, 0.47, 0.46, 0.44, 0.49, 0.49, 0.41, 0.49}

In[62]:= Table[simulate[n1, n1 + n2, 2, 27, 100] // N, {10}]
Out[62]= {0.59, 0.51, 0.4, 0.48, 0.48, 0.53, 0.5, 0.47, 0.55, 0.43}

Here is a neat sort of test. Run this simulation many times. Count the entire run a success if at least half the runs show a success at least half the time.

success[n1_, n2_, k_, m_, reps_] := 
 Count[Table[simulate[n1, n1 + n2, k, m, reps], {reps}], 
   aa_ /; aa >= 1/2] >= reps/2

In[68]:= Table[success[n1, n2, 2, 26, 100], {10}]
Out[68]= {False, False, False, False, False, False, False, False, \
False, False}

In[69]:= Table[success[n1, n2, 2, 27, 100], {10}]
Out[69]= {True, True, True, True, True, False, True, True, True, True}

These results will have to speak for themselves, since I don't actually know their language.

Daniel Lichtblau
Wolfram Research

Prev by Date: Windows possible memory leak

Next by Date: Re: Preventing In-line Math Typesetting From Being Scaled Down in Text

Previous by thread: Cumulative probability question

Next by thread: extra lines in framed plots?