MathGroup Archive: November 2013 [00003]

[Date Index] [Thread Index] [Author Index]

Re: Mathematica not IEEE-754 compliant?

To: mathgroup at smc.vnet.net
Subject: [mg131935] Re: Mathematica not IEEE-754 compliant?
From: Alain Cochard <Alain.Cochard at unistra.fr>
Date: Sat, 2 Nov 2013 02:26:43 -0400 (EDT)
Delivered-to: l-mathgroup@mail-archive0.wolfram.com
Delivered-to: l-mathgroup@wolfram.com
Delivered-to: mathgroup-outx@smc.vnet.net
Delivered-to: mathgroup-newsendx@smc.vnet.net
References: <l4n97h$b25$1@smc.vnet.net> <l4sjo2$pkv$1@smc.vnet.net>

Thank you very much for your answer!

danl at wolfram.com writes on Thu 31 Oct 2013 03:45:

 > I'm not convinced that one or the other is incorrect. It could be
 > the case, for example, that 80 bit registers at certain steps are
 > required to get one result whereas 64 bit registers, or storing
 > from 80 to 64 bit doubles in memory, will give the other. Whether
 > or how 80 bit registers get used will depend on settings at compile
 > time for the various programs being used.


Unfortunately, I am not familiar with 80 bit registers.  My --perhaps
naive-- understanding comes from the official document "IEEE Standard
for Binary Floating-Point Arithmetic", in which I read that

   All conforming implementations of this standard shall provide
   operations to add, subtract, multiply, divide, extract the square
   root, find the remainder, round to integer in floating-point
   format, convert between different floating-point formats, convert
   between floating-point and integer formats, convert binary <--->
   decimal, and compare. [...] Except for binary <---> decimal
   conversion, each of the operations shall be performed as if it
   first produced an intermediate result correct to infinite precision
   and with unbounded range, and then coerced this intermediate result
   to fit in the destination's format (see Sections 4 and 7).

Section 7 is dealing with exceptions, which should not be a concern
here. Also, as far as I understand, the issue I raised is not related
to binary <---> decimal conversion.  Section 4.1 is "Round to Nearest"
and says:

   An implementation of this standard shall provide round to nearest
   as the default rounding mode. In this mode the representable value
   nearest to the infinitely precise result shall be delivered;

So I do not understand how, on a system conforming to the standard,
the result of a single multiplication might or might not give the
closest machine number.

 > One thing I can point out is that the Mathematica result is
 > arguably the better of the two. Once you create 'num' and 'den' as
 > machine numbers, if you now take their bit sequences, make exact
 > rationals, and divide to high precision, then create the bit
 > sequence for the corresponding approximate value, you will recover
 > the 'bin' set of bits, not that of 'presumablyClosestInfPrec'. I
 > show this below.

 > 
 > In[39]:= Take[
 >   RealDigits[
 >     N[FromDigits[RealDigits[num, 2], 2]/
 >       FromDigits[RealDigits[den, 2], 2], 20], 2][[1]], 53] === bin[[1]]
 > 
 > Out[39]= True
 > 
 > Notice that I used N[...,20] to make sure to correctly reconstruct
 > that (exact) quotient to more than 53 correct bits.

Here too, I am probably missing something.  If I understand correctly,
you are saying that Mathematica's result is the better of the two because of
internal consistency.  But it seems to me that when you take the first
53 digits, you are simply "floor'ing" the number.  As, in my exemple,
the ordering is:

    Mma_result < exact_result < closest_number 

it is no surprise to me that you recover 'bin', but I do not see why
it shows consistency.

Here is another example with 

   closest_number < exact_result < Mma_result

numInfPrec=2^^11000001101111100011010100110001011111101100101111110*2^-53;
denInfPrec=2^^11100010111100000100110011100111001001000101100111011*2^-53;

num=1.*numInfPrec;
den=1.*denInfPrec;

res=num/den;
bin=RealDigits[res,2]

This gives:
   {{1, 1, 0, 1, 1, 0, 1, 0, 1, 0, 0, 0, 1, 1, 0, 1, 1, 0, 1, 0, 1, 0, 0, 
   0, 0, 1, 1, 1, 1, 0, 1, 0, 1, 0, 0, 0, 0, 1, 0, 0, 1, 0, 0, 1, 1, 0, 0, 
   1, 1, 0, 1, 0, 0}, 0}

presumablyClosestInfPrec=
2^^11011010100011011010100001111010100001001001100110011*2^-53
(* This is the machine number immediately before the previous one *)

Check that it is indeed the machine number closest to the exact result: 

N[ { numInfPrec/denInfPrec - presumablyClosestInfPrec , 
         FromDigits[bin,2] - presumablyClosestInfPrec}, 3]//InputForm


This gives {3.998894021443906958645`3.*^-17, 
            1.1102230246251565404236`3.*^-16} 

Your piece of code:

Take[
  RealDigits[
    N[FromDigits[RealDigits[num, 2], 2]/
      FromDigits[RealDigits[den, 2], 2], 20], 2][[1]], 53] === bin[[1]]

gives "False", while 

Take[
  RealDigits[
    N[FromDigits[RealDigits[num, 2], 2]/
      FromDigits[RealDigits[den, 2], 2], 20], 2][[1]], 53] === 
                               RealDigits[presumablyClosestInfPrec,2,53] [[1]] 

gives "True".

Hopefully you can improve my understanding.

With best wishes,
Alain

-- 
EOST (Ã?cole et Observatoire des Sciences de la Terre) 
IPG (Institut de Physique du Globe) | alain.cochard at unistra.fr
5 rue RenÃ© Descartes   [bureau 106] | Phone: +33 (0)3 68 85 50 44 
F-67084 Strasbourg Cedex, France    | Fax:   +33 (0)3 68 85 01 25

Prev by Date: Solve PDE system with 3 variables

Next by Date: Re: Finding branches where general solution is possible

Previous by thread: Solve PDE system with 3 variables

Next by thread: Re: Mathematica not IEEE-754 compliant?