Re: Random[BinomialDistribution[..]] wrong ?

*To*: mathgroup at christensen.cybernetics.net*Subject*: [mg1089] Re: Random[BinomialDistribution[..]] wrong ?*From*: withoff (David Withoff)*Date*: Sun, 14 May 1995 20:51:30 -0400*Organization*: Wolfram Research, Inc.

In article <3opku3$7h4 at news0.cybernetics.net> rubin at msu.edu (Paul A. Rubin) writes: >In article <3olugl$ql4 at news0.cybernetics.net>, > bzbkowal at ZIB-Berlin.DE (Axel Kowald) wrote: >->I'm using Mathematica 2.2 on a Macintosh and I got a problem generating >binomially >->distributed random numbers. >-> >->I have: >-> >->Needs["Statistics`DiscreteDistributions`"] >->SeedRandom[17] >->Table[Random[BinomialDistribution[10^6,0.01]],{5}] >->Out-> {998486, 998486, 998486, 998486, 998486} >-> >->Not only that the random numbers are all the same, most importantly they >are in a >->completely wrong range. One should get something around 10000 ! >->The problem seems to be 10^6, if I use something smaller, like 10000, >everything works >->fine. Furthermore the problem seems to be in the Random[] function since >->PDF[BinomialDistribution[10^6,0.01],n] works okay (but takes a long >time). >-> >->Any comments, hints, solutions ? > >I pasted your input statements into version 2.2.2 for Windows and got >something different (and a lot more reasonable): > > {9996, 9954, 9854, 10107, 9837} > >If another Mac user can reproduce the problem, then I would suspect it to >be a bug in the Mac kernel. > >Paul This is an example of a previously unknown (but not terribly surprising) Macintosh specific problem with machine-arithmetic evaluation of the BetaRegularized function. Random numbers with a binomial distribution are computed by computing binomial quantiles of uniformly distributed random numbers. Binomial quantiles are computed by interval-halving with the cumulative density function, and the cumulative density function is evaluated using BetaRegularized. (For more details, see the code in the Statistics`DiscreteDistributions` package. The problem can be traced to BetaRegularized with ordinary debugging methods.) For example, here are some expected (incorrect) results on a non-power-pc Macintosh: In[7]:= CDF[BinomialDistribution[10^6, .01], 9000] 17383 Out[7]= 5.5245656595 10 In[8]:= BetaRegularized[.99, 991000, 9001] 17383 Out[8]= 5.5245656595 10 The universal suggestion in case of difficulty with machine-arithmetic evaluation of special functions is to use variable-precision arithmetic. The idiosyncracies of machine arithmetic make evaluation of special functions very difficult, and as a practical matter it may be impossible to get reliable results for all of the special functions of Mathematica using machine arithmetic alone. This is one of the main reasons that variable-precision arithemtic is included in Mathematica. You can get good results, even on a non-power-pc Macintosh, for the above example using variable-precision arithmetic. In[9]:= N[BetaRegularized[99/100, 991000, 9001], 30] -25 Out[9]= 8.350415917 10 To roughly ensure that the accuracy of the cumulative density function is beyond the granularity of the 1000000 steps in CDF[BinomialDistribution[10^6,0.01], k], it is necessary for the accuracy of the CDF calculation to be at least 10 digits. (This isn't quite right, but it is close enough for the present discussion.) Here is a short example involving the use of a rule for BetaRegularized that causes the precision to be increased until it gets a result with an accuracy of at least 10 digits. In[1]:= Needs["Statistics`DiscreteDistributions`"] In[2]:= Unprotect[BetaRegularized] Out[2]= {BetaRegularized} In[3]:= BetaRegularized[z_?MachineNumberQ, a_Integer, b_Integer] := Module[{prec, result}, prec = 20; Print[{z, a, b}, " : trying Precision = ", prec]; While[prec <= $MaxPrecision, result = BetaRegularized[SetPrecision[z, prec], a, b]; If[Accuracy[result] >= 10, Return[result]]; prec += 40; Print[" Accuracy[result] = ", Accuracy[result], " : trying Precision = ", prec] ]; $Failed ] In[4]:= Protect[BetaRegularized] Out[4]= {BetaRegularized} In[5]:= Random[BinomialDistribution[10^6,0.01]] {0.99, 500000, 500001} : trying Precision = 20 {0.99, 750000, 250001} : trying Precision = 20 {0.99, 875000, 125001} : trying Precision = 20 {0.99, 937500, 62501} : trying Precision = 20 {0.99, 968750, 31251} : trying Precision = 20 {0.99, 984375, 15626} : trying Precision = 20 {0.99, 992188, 7813} : trying Precision = 20 {0.99, 988282, 11719} : trying Precision = 20 {0.99, 990235, 9766} : trying Precision = 20 Accuracy[result] = 6 : trying Precision = 60 Accuracy[result] = 9 : trying Precision = 100 {0.99, 989259, 10742} : trying Precision = 20 {0.99, 989747, 10254} : trying Precision = 20 Accuracy[result] = 7 : trying Precision = 60 {0.99, 989991, 10010} : trying Precision = 20 Accuracy[result] = 0 : trying Precision = 60 Accuracy[result] = 4 : trying Precision = 100 Accuracy[result] = 7 : trying Precision = 140 {0.99, 989869, 10132} : trying Precision = 20 Accuracy[result] = 5 : trying Precision = 60 Accuracy[result] = 9 : trying Precision = 100 {0.99, 989930, 10071} : trying Precision = 20 Accuracy[result] = 2 : trying Precision = 60 Accuracy[result] = 6 : trying Precision = 100 {0.99, 989961, 10040} : trying Precision = 20 Accuracy[result] = 2 : trying Precision = 60 Accuracy[result] = 5 : trying Precision = 100 Accuracy[result] = 9 : trying Precision = 140 {0.99, 989976, 10025} : trying Precision = 20 Accuracy[result] = 2 : trying Precision = 60 Accuracy[result] = 4 : trying Precision = 100 Accuracy[result] = 7 : trying Precision = 140 {0.99, 989969, 10032} : trying Precision = 20 Accuracy[result] = 2 : trying Precision = 60 Accuracy[result] = 5 : trying Precision = 100 Accuracy[result] = 9 : trying Precision = 140 {0.99, 989965, 10036} : trying Precision = 20 Accuracy[result] = 2 : trying Precision = 60 Accuracy[result] = 5 : trying Precision = 100 Accuracy[result] = 9 : trying Precision = 140 {0.99, 989963, 10038} : trying Precision = 20 Accuracy[result] = 2 : trying Precision = 60 Accuracy[result] = 5 : trying Precision = 100 Accuracy[result] = 9 : trying Precision = 140 {0.99, 989962, 10039} : trying Precision = 20 Accuracy[result] = 2 : trying Precision = 60 Accuracy[result] = 5 : trying Precision = 100 Accuracy[result] = 9 : trying Precision = 140 Out[5]= 10039 As you can see, calculation of even a single random number for BinomialDistribution[1000000, .01] can be something of an ordeal. This analysis is crude, and is intended only to be suggestive of what is going on, rather than a complete solution. There are many optimizations that could be added. For example, the cumulative density function of the binomial distribution uses the general BetaRegularized function, which is not optimized for the arguments that arise in this calculation. A stickier problem lurking the the background is that when the resolution of the arithmetic is comparible to the distance between steps in the cumulative density function, then the arithmetic will distort the distribution of random numbers. For BinomialDistribution[n, p], these steps are very close together, especially in the tails of the distribution. The most common way of dealing with this problem that I have seen is to approximate CDF[BinomialDistribution[n, p], k] for large values of n with something that is easier to calculate. Those large-n formulas that you see in elementary statistics books are useful not only because they make it easier to find things in common tables, but because they are easier to evaluate numerically. So my real recommendation, rather than crank up the precision for BetaRegularized, is to find a large-n approximation for quantiles of the binomial distribution. Dave Withoff Research and Development Wolfram Research