Re: Interval vs default, was Re: Mathematica numerics and... Re: Applying
- To: mathgroup at smc.vnet.net
- Subject: [mg131025] Re: Interval vs default, was Re: Mathematica numerics and... Re: Applying
- From: Daniel Lichtblau <danl at wolfram.com>
- Date: Tue, 4 Jun 2013 02:01:04 -0400 (EDT)
- Delivered-to: l-mathgroup@mail-archive0.wolfram.com
- Delivered-to: l-mathgroup@wolfram.com
- Delivered-to: mathgroup-outx@smc.vnet.net
- Delivered-to: mathgroup-newsendx@smc.vnet.net
- References: <kmngb2$3rv$1@smc.vnet.net> <20130519095011.606CD6A14@smc.vnet.net>
I'm clipping quite a lot as this is already too long. Also some points were clarified, in both directions, off-line and I won't go into all of them. I'm not trying to respond to every point raised, just to a few that I think are compelling. And, alas, this is probably way too technical anyway. > There is a wikipedia entry on the topic that (at least as of this time > and date) is pretty good. The Wikipedia article in question is this. http://en.wikipedia.org/wiki/Significance_arithmetic In brief, I don't much like it. Little is said about how significance arithmetic really works (using first order approximations from Taylor series). And it seems to mix "significant figures arithmetic" with significance arithmetic. That is, to say the least, confusing. > The important quality of interval arithmetic that is not preserved is > that interval arithmetic (done correctly) preserves validity. That is, > the correct answer is guaranteed to be enclosed in the result. This is true but I think less important than one might think at first glance. It is certainly necessary for some purposes. e.g. mathematical proofs, to have error fully enclosed. It is less important for, say, reliable computation, provided one has determined that results are not corrupted, say by precision becoming too low for the first order model to hold up. I will give an example that more or less stradles these two worlds. See: http://ccsenet.org/journal/index.php/jmr/article/view/22480/14613 A counterexample is indicated to a conjecture from 2003. The code used to verify the counterexample invokes NSolve at 500 digits. If I recall correctly the exact computation would take longer than my patience allows, and finding an easier example that would be faster was likewise beyond my patience. I am confident that the counterexample holds up just fine. This despite the fact that one needs approximate arithmetic to see it through, and despite the fact that significance arithmetic is used to find and count real- valued solutions. Even the referee, who was in other respects quite skritchy, did not argue this point. (Had that happened, I would simply have convinced Mathematica to validate that or a different counterexample, and probably unsettled the referee even more in so doing.) > It seems to me that the default arithmetic is fairly complicated. Each > extended float number F has some other number E associated with it > (guess: it is a machine float representing something like log of > relative error), and calculations require mucking about with F and E > both, e.g. some calculations involving condition numbers relating to how > fast a function changes as the argument changes. Yes, that all comes from the chain rule and first derivatives. > I don't understand the derivative argument -- intervals can be computed > essentially by composition of interval functions. This may be overly > pessimistic compared to actually finding extrema, but it is, so far > as I know, fairly easy. Maybe I am missing something basic, but I don't think it is actually so easy. How do you determine the spread of an interval when evaluating some transcendental function? With trigs, on the real line, one can take advantage of periodicity, monotonicity between peaks and valleys, etc. But for arbitrary functions i would think one must work to either locate, or get outer bounds on, extremal values when evaluation on an interval. This does not strike me as either easy or computationally fast. > > A drawback, relative to interval arithmetic, is that as a first-order > > approximation, in terms of error propagation, significance arithmetic > > breaks down at low precision where higher-order terms can cause the > > estimates to be off. I will add that, at that point, intervals are going > > to give results that are also not terribly useful. > > An advocate of interval arithmetic (and at least for the moment I will > take this position :) ) would say that interval arithmetic guarantees > a valid answer. That is, the answer is inside the interval. Sometimes > the interval may be very wide, even infinitely wide. But it contains > the answer. Not so for significance arithmetic. While it may be > appropriately modest for the documentation for N[] to say that > N[expr,n] >>>attempts<<< to give a result ... This is not really > what interval arithmetic does. It guarantees... All fair points, though I do say a bit about reliability above. I will add to that here with a short description of the underlying mechanics. We have a bignum float and also a machine float to record and track precision of the bignum. While this is an implementation aspect, all the same it often makes sense to view a number as x + dx, where the 'dx' part indicates the error range. Error is tracked using derivatives to approximate. For example, Plis[] can be fully bounded in this first order model since (x+dx) + (y+dy) = (x+y) + (dx+dy) Well and good. What about Times? (x+dx) * (y+dy) = (x*y) + (x*dy + y*dx) + (dx*dy) The error terms tracked are the middle two-- the dx*dy is in effect dropped. So how bad is this error dropping? Consider how we actually represent our error. We use a machine double, and that itself has a certain granularity. A bit of thought, which I will not attempt to write out here, will show the following: for multiplication, the second order error that is discarded will be less than the granularity of error that can be recorded, provided the precision remains higher than that can be kept in a machine double. For all systems on which Mathematica runs, this amounts to around 16 digits. Another way to view it is this. Were we to add one ULP to our precision loss (we do not do this, ity's just hypothetical), we would fully bound error from Times provided precision is above the granularity implied by this ULP. (Paradoxically this means that using e.g. quad precision for error would require greater precision for certainty of enclosing this error. But on the flip side precision degradation would be far slower. Similar happens for e.g. inverting (as in 1/(x+dx) and taking of integer powers. While these are by no means the only operations one does, they are the most common in many algorithms at least of the sort I run across. There are a few more things I might be able to say but I think this one response has already taken me a bit afield. Maybe another post. Daniel Lichtblau Wolfram Research