MathGroup Archive: October 2002 [00517]

[Date Index] [Thread Index] [Author Index]
Re: Off by 0.00000001, Why?
To: mathgroup at smc.vnet.net
Subject: [mg37392] Re: [mg37378] Off by 0.00000001, Why?
From: Sseziwa Mukasa <mukasa at jeol.com>
Date: Sat, 26 Oct 2002 02:03:28 -0400 (EDT)
Sender: owner-wri-mathgroup at wolfram.com
On Friday, October 25, 2002, at 10:24 AM, Steven T. Hatton wrote:

> On Friday 25 October 2002 09:34 am, you wrote:
>

> This is not a high priority for me, but it would be nice to know how to
> identify the source of such discrepancies.
>

In a program as large and complex and (dare I say it?) closed source as 
Mathematica that would be nearly impossible.  Without access to actual 
instruction stream sent to the processor and knowledge of the processor 
state there is no way of determining the exact result of anything but 
the simplest floating point expression.  In principle you could attach 
a debugger to the executing code but good luck in being able to 
determine which machine instructions correspond to the expression of 
interest in a program as large as Mathematica.

Using some small C or Fortran programs in a debugger may help you probe 
your machine's and compiler's idiosyncrasies but that's not much help 
in determining how Mathematica operates.

> I *believe* Java trys to create a platform neutral computing 
> environment which
> insures results will be uniform across platforms.  IIRC, I read 
> something
> about this kind of thing in the Mathematica documentation.  There are 
> ways of
> manipulating the content of atoms which can be used to optimize 
> performance,
> but they are discouraged because they can lead to the kinds of 
> discrepancies
> were are discussing.... Indeed, see A.1.4 of the Mathematica Book: 
> (4.2, Help
> Browser)
>

The information in A.1.4 seems to indicate that you have access to the 
underlying bit pattern in hexadecimal form of a float or complex or 
other atomic type.  It's not obvious to me that you can use these 
results to manipulate floating point computations though since the Raw 
representation is just that a representation and not a pointer to the 
actual data.  Anyway the raw byte patterns of data still doesn't give 
you enough information.  Returning to the example of Intel x86 versus 
most other architectures, if you set the processor state correctly 
multiplying two IEEE 754 floating point doubles (64-bit long float) 
will result in an intermediate 80 bit IEEE floating point long double 
(that's why the register is 80 bits wide) which will then be converted 
back to an IEEE 754 floating point double when stored in memory.  That 
64 to 80 bit conversion is the source of many differences between 
identical numerical code on an x86 chip and other architectures.

Java gets around this by disabling the 80 bit conversion.  I have tried 
as much as possible to stay away from the details of x86 machine code 
as possible in my career so I don't know the details of the process but 
I believe it is possible to turn off the 64 to 80 bit conversion on x86 
chips.  However many JVMs now do JIT compilation, which converts large 
sections of Java code to optimized machine code for execution.  
Depending on the JIT compiler and hardware you use it is possible that 
a complex expression could be evaluated in different orders depending 
on the peculiarities of the pipeline structure of the processor.  These 
guys (http://www.naturalbridge.com/floatingpoint/) seem to agree with 
me about JITs effect on Java's reproducibility problems.  I'm not sure 
what the Java standard says about this though, but from page 41 of this 
paper (http://java.sun.com/people/darcy/JavaOne/2001/1789darcy.pdf) it 
seems there is a flag that forces a JVM to force equivalent results on 
different architectures.  There is probably a large performance penalty 
for this though.

Finally section 3.1.6 of the Mathematica book seems to indicate that 
Mathematica will use whatever native floating point capabilities exist 
on a particular hardware platform.  Again, this indicates to me that we 
can expect machine precision values to have 80 bit intermediate values 
in calculations on x86 (I'm not sure about IA-64) hardware.  Also it is 
probably safe to assume that the Mathematica kernel is compiled to take 
advantage of the pipeline structure of whatever hardware it is running 
on and that machine precision expressions will be executed in the most 
efficient order possible on a particular hardware platform.  The only 
way to test this is to create an expression whose result severely 
depends on the order of operations implied by parentheses then evaluate 
the expression with different parenthesization (is that a word?) to see 
if Mathematica respects the parentheses or internally reorders the 
expression into the most efficient form.  I'm trying to think of an 
example right now, I'll post the results when available.

> I do seem to recall a lost probe not too long ago.  Seems someone 
> forgot to
> convert from miles to kilometers, or something like that. To my mind 
> that was
> just plain stupidity.  They should never have been using imperial 
> units in
> the first place.
>

There is also the story of the Ariane rocket 
(http://www.esrin.esa.it/htdocs/tidc/Press/Press96/ariane5rep.html) 
that was lost due to not checking for out of range exceptions when 
converting from floating point values to integers.  No process is 
perfect, including auditing the code.

> Nonetheless, this demonstrates the kinds of things which can creep 
> into your
> calculations.  Suppose the Auditors come in and bless everything off, 
> and the
> next week you get a brand new computer. Not realizing that the 
> hardware will
> influence the outcome of your calculation, you copy everything over to 
> the
> new system, give the nice lady at WRI a call to get your new password, 
> and
> run the calculations 10 times faster, but based on assumptions which 
> are
> inconsistent with your current environment.  Ooops,  "Houston, we've 
> got a
> problem."
>

The possibility of having to make that phone call is why I am glad I 
don't work for NASA ;-).

At any rate If you were considering writing satellite launch or control 
code with Mathematica I suggest you examine the second paragraph of the 
Limited Warranty on your license agreement, in particular "WRI does not 
recommend the use of the software for applications in which errors or 
omissions could threaten life, injury or significant loss."

Regards,

Ssezi
Prev by Date: Re: Off by 0.00000001, Why?
Next by Date: Re: Pascal's Triangle
Previous by thread: Re: Re: Off by 0.00000001, Why?
Next by thread: Re: Re: Off by 0.00000001, Why?