MathGroup Archive 1990

[Date Index] [Thread Index] [Author Index]

Search the Archive

High precision floating point considered hazardous

  • To: mathgroup at
  • Subject: High precision floating point considered hazardous
  • From: Steve Christensen <uunet!!stevec>
  • Date: Fri, 27 Apr 90 10:02:32 -0500

Suppose that I double a floating point number then subtract off the
original number.  I ought to get the original number back, with maybe
the LSB being trashed.  If I do this N times, the introduced error
should be at most N times the weight of the LSB.  Well, Mathematica's
high precision floating point numbers don't work that way!  In fact,
the way they work is downright hazardous, as I found out.

In[1]:= a=1.1111111111111111

Out[1]= 1.1111111111111111

In[2]:= Do[a=2a-a,{55}]

In[3]:= a

Out[3]= 0.

Wow, just 55 iterations got a zero, not a 1.111... with some error.
Furthermore that zero is a black hole:

In[4]:= a+1

Out[4]= 0.

Everything works fine if we use a machine float instead of a big float:

In[5]:= b=1.111111111

Out[5]= 1.111111111

In[6]:= Do[b=2b-b,{1000}]

In[7]:= b

Out[7]= 1.11111

In[8]:= b-%5

Out[8]= 0.

The problem is that each iteration ratchets down the precision by one
until it eventually gets to zero.

Although it is less serious, there are also constructs that ratchet up
the precision, which, by the way, indicates that the precision
calculation cannot be quite correct.

In[35]:= p=N[Pi,20]

Out[35]= 3.1415926535897932385

In[36]:= Do[p=(p+p)/2,{100}]

In[37]:= p

Out[37]= 3.1415926535897932384626445913472373472822072149898

In[39]:= Precision[p]

Out[39]= 50

In[40]:= N[Pi,50]

Out[40]= 3.1415926535897932384626433832795028841971693993751

You might think that you never use big floats, so you don't have to
worry.  That's what I thought, too.  Unfortunately, the function Fit
sometimes outputs a function with coefficients that are big floats,
even when only passed machine floats in the input:

In[24]:= Fit[{0., 0.00694444, 0.0165094, 0.0278869},{x^3,

Out[24]= -0.00351594 + 0.0015323633333333 x + 0.00211823999999999 x  - 
>    0.00013466333333333 x

(Those are the actual numbers that caused the problem that I ran into.)

The ratcheting up of precision may be a performance problem, but the
ratcheting down of precision is downright disastrous.  I can suggest
two approaches: either take care to avoid big floats altogether in
calculations where each iteration uses the results of the previous
iteration, or if you really need higher precision, figure out in
advance what precision you want, then restore the precision of every
variable at the end of each iteration with something like 
x = SetPrecision[x,desiredPrecision]

The immediate problem arises from the way Mathematica computes the
precision/accuracy of the result of an addition: it seems to make the
accuracy of the result be the accuracy of the least accurate addend.

Now for some opinions:

There are at least 3 different notions of precision.

1.  A bound on the error introduced in each computation (including
conversion to floating point) leading up to the present value.  Thus a
precision of 16 digits (53 bits) means the result is as
accurate/precise as could have been expected if 53 bit mantissas had
been used in all calculations leading to the number.  This says
nothing about how close the number represented is to the ideal number
that would have occurred if all arithmetic in an entire computation
were exact; rather, it says that in each individual computation at
most so much error is introduced by the arithmetic and by forcing
results back into variables of this much "precision".

2.  An upper bound on the relative difference between the number
actually represented and that which would have occurred if each
computation were exact.  Thus 1.6453 with a precision of 4 means that,
had all arithmetic been exact, the result would have been in the range
1.6453 +/- .00016453.  

3.  An estimate of the uncertainty of the output of a calculation
based on the uncertainty of measured values.  This is commonly used in
physics labs.  (The mumblefoo distance was 76.2 +/- .1 cm.)  Thus if
from an experiment computing the acceleration of gravity, the
calculation yields 9.81 with a precision of 3, the true acceleration
of gravity lies in the range 9.815 +/- .009815, assuming there are no
systematic errors in the experiment.

Mathematica seems to be trying for meaning two.  (Although the policy of
making the precision of a literal be the number of digits typed in
shows vestiges of meaning 3.)  As evidence for this claim note that the
precision of a result of a subtraction is less than either argument:

In[41]:= q = 1.11111111111111111111111111111

Out[41]= 1.11111111111111111111111111111

In[42]:= 1025 q - 1024 q

Out[42]= 1.11111111111111111111111111

Also N[] reduces but never increases precision, presumably based on
the idea that it shouldn't give the illusion of precision that isn't
already there.

Now the question is, what ought Mathematica to provide?  (Disclaimer,
I'm no expert in this stuff.  Other people have undoubtedly written
entire dissertations on the topic.)  It ought to be useful, free of
ratcheting, either up or down, and reasonable to implement.  It also
ought to have an appealingly simple definition, and handle exact
numbers (numbers with a precision of Infinity) without any ugly
special cases that break the spirit of the definition.

I'll dispose of definition 3 (propagation of input uncertainty to
output) right off: it should be done by the user, possibly with the
help of a Mathematica package.  

I believe that correctly implementing a system based on definition 2
is virtually impossible.  While it is can be done locally for each
operation, the composite precision for an entire computation fails to
give the right answer.  To see this, suppose that x is some variable,
and in the early stages an error of epsilon was introduced, so we have
x = x' + epsilon, where x' is the ideal value.  Then we compute y=x,
then eventually y-x.  We ought to get a true zero.  But since the
system does not know that the error in x is always the same as the
error in y, it must assume the worst case, namely that they could be
of opposite sign, and indicate the result to be 0 +/- 2 epsilon.  As
the example involving Pi indicates, Mathematica doesn't even always get
the calculated precision right.  (I also conjecture that even getting it
locally right requires that the accuracy be carried to several
significant bits, but I'll skip that discussion.)

What the user really wants is to ensure that his/her calculations are
done with enough precision (definition 1 sense) that errors introduced
by arithmetic and by forcing numbers into some representation are
negligible for his/her application.  But since Mathematica is not a
compiled language with declarations for every variable (and since in
any case there are still intermediate results that need to be
represented), we want a reasonable precision to be automatically
generated for every operation.

I suggest that a good definition of "precision" is roughly definition
1.  (Admittedly, its not really a property of a number, but of a
computation.  Perhaps we need a different word.)  Intuitively a number
has a certain precision, P, if it is as accurate/precise as one could
have expected had P bits of mantissa been used in all
computations/conversions leading up to it.  (It may be convenient to
display/accept precision in decimal digits, but in this discussion we
are using bits.)  We need rules for the precision of the results of
the various operations.  I suggest the following.  When converting to
a float, the precision is the number of bits the mantissa is truncated
to.  The precision of the output of the basic arithmetic opertions is
the minimum of the precision of the operands.

The basic defintion is obviously satisfied, since the reported
precision is the minimum of all the precisions involved in the entire

What properties does this have?

1.  Operations on a pair of numbers of the same precision leaves the
precision unchanged.

2.  The precision can never ratchet up or down.

3.  There is no special case for exact numbers.  The can be converted
to the precision of the other operand and the operation can proceed.

4.  If input is always converted to at least the precision of machine
floats, then machine floats are in no way a special case.  If some big
float or exact number is to be combined with a machine float, just
convert it to a machine float and proceed with the two machine floats.

  -- David Jacobson

Postscript: I spent many hours working on an alternate set of rules
for addition and subtraction.  This set of rules would have allowed an
addition as shown in the example below to keep the precision of the
large, high precision number.  (The picture shows the addition of the
mantissas after shifting. A "d" indicates a bit actually available in
the data.)


Unfortunately, proving that it satisfied the basic definition required
that doing a calculation to N digits, then truncating down to a
smaller M, be considered equivalent to having done the calculation in
M digits all along---something I know is false.  If there is any
interest in this topic, someone just send mail directly to me and I'll
post the idea.

  • Prev by Date: NumberQ
  • Next by Date: Re: NumberQ
  • Previous by thread: Re: NumberQ
  • Next by thread: Re: High precision floating point considered hazardous