High precision floating point considered hazardous

*To*: mathgroup at yoda.ncsa.uiuc.edu*Subject*: High precision floating point considered hazardous*From*: Steve Christensen <uunet!yoda.ncsa.uiuc.edu!stevec>*Date*: Fri, 27 Apr 90 10:02:32 -0500

Suppose that I double a floating point number then subtract off the original number. I ought to get the original number back, with maybe the LSB being trashed. If I do this N times, the introduced error should be at most N times the weight of the LSB. Well, Mathematica's high precision floating point numbers don't work that way! In fact, the way they work is downright hazardous, as I found out. In[1]:= a=1.1111111111111111 Out[1]= 1.1111111111111111 In[2]:= Do[a=2a-a,{55}] In[3]:= a Out[3]= 0. Wow, just 55 iterations got a zero, not a 1.111... with some error. Furthermore that zero is a black hole: In[4]:= a+1 Out[4]= 0. Everything works fine if we use a machine float instead of a big float: In[5]:= b=1.111111111 Out[5]= 1.111111111 In[6]:= Do[b=2b-b,{1000}] In[7]:= b Out[7]= 1.11111 In[8]:= b-%5 Out[8]= 0. The problem is that each iteration ratchets down the precision by one until it eventually gets to zero. Although it is less serious, there are also constructs that ratchet up the precision, which, by the way, indicates that the precision calculation cannot be quite correct. In[35]:= p=N[Pi,20] Out[35]= 3.1415926535897932385 In[36]:= Do[p=(p+p)/2,{100}] In[37]:= p Out[37]= 3.1415926535897932384626445913472373472822072149898 In[39]:= Precision[p] Out[39]= 50 In[40]:= N[Pi,50] Out[40]= 3.1415926535897932384626433832795028841971693993751 You might think that you never use big floats, so you don't have to worry. That's what I thought, too. Unfortunately, the function Fit sometimes outputs a function with coefficients that are big floats, even when only passed machine floats in the input: In[24]:= Fit[{0., 0.00694444, 0.0165094, 0.0278869},{x^3, x^2,x,1},x] 2 Out[24]= -0.00351594 + 0.0015323633333333 x + 0.00211823999999999 x - 3 > 0.00013466333333333 x (Those are the actual numbers that caused the problem that I ran into.) The ratcheting up of precision may be a performance problem, but the ratcheting down of precision is downright disastrous. I can suggest two approaches: either take care to avoid big floats altogether in calculations where each iteration uses the results of the previous iteration, or if you really need higher precision, figure out in advance what precision you want, then restore the precision of every variable at the end of each iteration with something like x = SetPrecision[x,desiredPrecision] The immediate problem arises from the way Mathematica computes the precision/accuracy of the result of an addition: it seems to make the accuracy of the result be the accuracy of the least accurate addend. Now for some opinions: There are at least 3 different notions of precision. 1. A bound on the error introduced in each computation (including conversion to floating point) leading up to the present value. Thus a precision of 16 digits (53 bits) means the result is as accurate/precise as could have been expected if 53 bit mantissas had been used in all calculations leading to the number. This says nothing about how close the number represented is to the ideal number that would have occurred if all arithmetic in an entire computation were exact; rather, it says that in each individual computation at most so much error is introduced by the arithmetic and by forcing results back into variables of this much "precision". 2. An upper bound on the relative difference between the number actually represented and that which would have occurred if each computation were exact. Thus 1.6453 with a precision of 4 means that, had all arithmetic been exact, the result would have been in the range 1.6453 +/- .00016453. 3. An estimate of the uncertainty of the output of a calculation based on the uncertainty of measured values. This is commonly used in physics labs. (The mumblefoo distance was 76.2 +/- .1 cm.) Thus if from an experiment computing the acceleration of gravity, the calculation yields 9.81 with a precision of 3, the true acceleration of gravity lies in the range 9.815 +/- .009815, assuming there are no systematic errors in the experiment. Mathematica seems to be trying for meaning two. (Although the policy of making the precision of a literal be the number of digits typed in shows vestiges of meaning 3.) As evidence for this claim note that the precision of a result of a subtraction is less than either argument: In[41]:= q = 1.11111111111111111111111111111 Out[41]= 1.11111111111111111111111111111 In[42]:= 1025 q - 1024 q Out[42]= 1.11111111111111111111111111 Also N[] reduces but never increases precision, presumably based on the idea that it shouldn't give the illusion of precision that isn't already there. Now the question is, what ought Mathematica to provide? (Disclaimer, I'm no expert in this stuff. Other people have undoubtedly written entire dissertations on the topic.) It ought to be useful, free of ratcheting, either up or down, and reasonable to implement. It also ought to have an appealingly simple definition, and handle exact numbers (numbers with a precision of Infinity) without any ugly special cases that break the spirit of the definition. I'll dispose of definition 3 (propagation of input uncertainty to output) right off: it should be done by the user, possibly with the help of a Mathematica package. I believe that correctly implementing a system based on definition 2 is virtually impossible. While it is can be done locally for each operation, the composite precision for an entire computation fails to give the right answer. To see this, suppose that x is some variable, and in the early stages an error of epsilon was introduced, so we have x = x' + epsilon, where x' is the ideal value. Then we compute y=x, then eventually y-x. We ought to get a true zero. But since the system does not know that the error in x is always the same as the error in y, it must assume the worst case, namely that they could be of opposite sign, and indicate the result to be 0 +/- 2 epsilon. As the example involving Pi indicates, Mathematica doesn't even always get the calculated precision right. (I also conjecture that even getting it locally right requires that the accuracy be carried to several significant bits, but I'll skip that discussion.) What the user really wants is to ensure that his/her calculations are done with enough precision (definition 1 sense) that errors introduced by arithmetic and by forcing numbers into some representation are negligible for his/her application. But since Mathematica is not a compiled language with declarations for every variable (and since in any case there are still intermediate results that need to be represented), we want a reasonable precision to be automatically generated for every operation. I suggest that a good definition of "precision" is roughly definition 1. (Admittedly, its not really a property of a number, but of a computation. Perhaps we need a different word.) Intuitively a number has a certain precision, P, if it is as accurate/precise as one could have expected had P bits of mantissa been used in all computations/conversions leading up to it. (It may be convenient to display/accept precision in decimal digits, but in this discussion we are using bits.) We need rules for the precision of the results of the various operations. I suggest the following. When converting to a float, the precision is the number of bits the mantissa is truncated to. The precision of the output of the basic arithmetic opertions is the minimum of the precision of the operands. The basic defintion is obviously satisfied, since the reported precision is the minimum of all the precisions involved in the entire calculation. What properties does this have? 1. Operations on a pair of numbers of the same precision leaves the precision unchanged. 2. The precision can never ratchet up or down. 3. There is no special case for exact numbers. The can be converted to the precision of the other operand and the operation can proceed. 4. If input is always converted to at least the precision of machine floats, then machine floats are in no way a special case. If some big float or exact number is to be combined with a machine float, just convert it to a machine float and proceed with the two machine floats. -- David Jacobson Postscript: I spent many hours working on an alternate set of rules for addition and subtraction. This set of rules would have allowed an addition as shown in the example below to keep the precision of the large, high precision number. (The picture shows the addition of the mantissas after shifting. A "d" indicates a bit actually available in the data.) 1ddddddddddddddddddddd 1dddddddd Unfortunately, proving that it satisfied the basic definition required that doing a calculation to N digits, then truncating down to a smaller M, be considered equivalent to having done the calculation in M digits all along---something I know is false. If there is any interest in this topic, someone just send mail directly to me and I'll post the idea.