Re: Root Finding Methods Gaurenteed to Find All Root Between (xmin, xmax)
- To: mathgroup at smc.vnet.net
- Subject: [mg112765] Re: Root Finding Methods Gaurenteed to Find All Root Between (xmin, xmax)
- From: Andrzej Kozlowski <akoz at mimuw.edu.pl>
- Date: Wed, 29 Sep 2010 04:16:53 -0400 (EDT)
First of all, let me state again that Semenev's paper and method is perfectly correct in the case of a equation involving only a single variable - which is the only case in which the RootSearch package works. It returns a the complete set of roots in an interval provided that there are no multiple roots in the interval, otherwise it will run forever. Of course, as with all computer programs this is subject to the usual limitations: if the roots are distinct but too close together you will run into various hardware and software limitations, so the "guarantee" is, of course, purely mathematical. But the same applies to all "exact" computer computations. In the case of equations in n-variables, Semenev's proof contains an error, which can be easily corrected and we have done so. The price of the correction is that the complexity of computations increases rather substantially with respect to the number of variables. Roughly, having n variables in the original algorithm is equivalent to having n^2 variables in the new. (As you can see, for n==1 there is no difference). Still, I think in the case n==2, which is the only one for which I am writing an implementation, I think the difference is not too bad. As for the rest of your questions,: I am in the process of writing all of this up carefully and implementing it in Mathematica. Since I am writing it elsewhere I see little point in duplicating my labour by also repeating it all here, and moreover in ASCII. If you wish I can send you the written up argument and implementation when I finish it. It should really be a simple task if I weren't so busy starting a new job. But in any case, it is virtually impossible to discuss mathematical proofs with someone who has not seen the relevant paper and does not have it in front of him. For example, instead of referring to Lemma 9, I would have to restate Lemma 9 and, moreover, do so using ASCII symbols. This I do not intend to do. So now some more detailed answeres. I will deal very briefly with questions 1) and 2) since I don't understand them at all. What do you mean by "apply interval analysis directly to the functions"? This makes no sense to me. Semenev proves two results. One is, that if the functions satisfy certain conditions in a box, then there is no root in the box. The second is that if the functions satisfy another sets of conditions, then there can be at most one root. (This is all under the assumption that there are no "multiple roots" in the box). These conditions involve upper bounds on the values of the functions and the values of the derivatives in the box. Any upper bounds will do; the better the upper bounds the faster the algorithm will work. I nterval arithmetic is used to find the upper bounds. Are you proposing to use interval aritmetic to do actual root finding rather than upper bound finding? How? Point 3. Here there is a confusion possibly resulting from some of my vague statements in past post on this topic (its dangerous to discuss mathematics in this way) but also because you clearly have not looked at Semenev's paper. Semenev shows that if the jacobian does not vanish in an entire box, then in that box there cannot be more than one root. This is only a lemma which he uses in his proof - it is not an assumption in his theorem! He does not assume that the jacobian never vanishes in the entire rectangle - that would be nonsense. He only assumes that there are no repeated roots in the rectangle. He proves however that if the jacobian does not vanish at all in some sub-rectangle than that sub-rectangle contains at most one root. However, as stated this is only true in the one dimensional case. In fact, the one dimensional argument is trivial and can easily be given here. It goes like this: Suppose in an open interval we have f(x0)==0 and f(x1)==0 and the derivative never vanishes in this interval. Use the mean value theorem like this: f(x1)==f(x0) + (x1-x0)f'(theta) where theta is some number between x0 and x1. Since f(x1)==f(x0)==0 this means (x1-x0)f'(theta)==0 hence f'(theta)==0. But we are assuming that f'(theta) does not vanish *anywhere* in the interval, hence we have a contradiction. So there can be at most one root in this interval. Unfortunately, Semenev simply assumes that this argument also applies to a functions from R^n to R^n and, in particular, that essentially the same statement of the mean-value theorem holds, with the derivative replaced by the Jacobian matrix (the equation is now a vector equation). This seems plausible and managed to get passed the referee of the paper and me (I reviewed for in Math Reviews) but it is well known that one cannot find a single vector theta which will make this statement hold. Instead, we have to proceed "component-wise", considering a different theta for each component f_i or the function f == (f1,f2,...,fn). We can still write the same sort of equation but now it looks like this: f(x1)==f(x0) + GJ(theta_1, theta_2,...,theta_n).(x1-x0) where GJ is the "generalized Jacobian". The generalized jacobian is a matrix obtained by taking the usual matrix of partial derivatives of all the function components with respect to tall the variables, but evaluating each component at a different point. Hence the generalized Jacobian matrix is a matrix valued function of n-vectors. The usual Jacobian is a matrix valued function of one vector (with n coordinates) obtained by composing the generalized Jacobian with the diagonal map. In all the statements in Semenev's paper it is necessary to replace ordinary Jacobian with the generalized one. Everything else remains valid. The case n==1 remains unchanged. Andrzej Kozlowski On 27 Sep 2010, at 23:00, Ingolf Dahl wrote: > It would be very good if the Semenov method can be transferred into a > program that it able to find all roots to multi-dimensional non-linear > equation systems. I have now also looked a little into this, but now have > three questions/statements: > > 1. The test T1 checks if the rectangle is free of roots. This is implemented > by comparing the center function values in the rectangle with upper limits > for the spatial derivatives times half the length of the rectangle. The > upper limits of the spatial derivatives are calculated by interval analys= is. > Why is it done in that way? Why not apply interval analysis directly on t= he > functions as such? If this derivative method gives better estimates than = an > direct interval analysis, then I think it would be a good idea to > incorporate such a formula in the interval analysis of functions. > > 2. The corresponding question is relevant for the T2 test. Here the task > seems to be to check if the Jacobian determinant is non-zero, and this te= st > is done by comparing the center value in the rectangle with upper limits = for > the spatial derivatives of the Jacobian determinant times half the length= of > the rectangle. The upper limits of the spatial derivatives are calculated= by > interval analysis. Why is it done in that way? Why not apply interval > analysis directly on the Jacobian determinant as such? Then the program d= oes > not need to evaluate the second-order derivatives of the functions in the > equations. > > 3. The reason to check if the Jacobian determinant is non-zero is that th= is > is said to check if there is at most a single root in the rectangle. > However, I looked in my textbook from my first year at the university > ("Matematisk analys II" by Hylt=E9n-Cavallius and Sandgren, Lund 1965, in > Swedish), and found discussion about this and a counterexample. The autho= rs > emphasize that a non-zero Jacobian implies that the function values aroun= d a > point are unique in an environment, which is small enough, but we might h= ave > several identical function values in a larger area. Then we have several > roots to the corresponding equation. I will repeat the counterexample her= e, > slightly modified, since I think that it is of interest to several of the > readers of MathGroup. > Suppose that we have -1/4 <== x<== 1/4 and 0 <== y <== 5*Pi/2 and the equ= ations > > Exp[x]*Sin[y] - 1/Sqrt[2] ==== 0 > Exp[x]*Cos[y] - 1/Sqrt[2] ==== 0 > > Then the Jacobian is Exp[2x], which is larger than zero. The rectangle al= so > passes Test2, in spite of the two roots {x -> 0, y -> Pi/4} and {x -> 0, = y > -> 9 Pi/4}. Maybe this is the error Daniel Lichtblau has noticed, but > anyhow, the paper by Semenov seems now a bit less impressive. And a plus > score for the mathematicians of the previous century. We thus need a bett= er > test, and a valid mathematical proof of it, to be able to search for all > roots in a guaranteed way. My feeling is that one might test the elements= of > the Jacobian matrix - if no one of them changes sign within the rectangle > (or hyper rectangle), and if the Jacobian determinant is non-zero of one > sign, then I guess that multiple roots are impossible. But that I have no= t > proved, and there might be better tests. Any suggestions? And what is mea= nt > by "generalized jacobian" in this context? > > Best regards > > Ingolf Dahl > ingolf.dahl at telia.com > >> -----Original Message----- >> From: Andrzej Kozlowski [mailto:akoz at mimuw.edu.pl] >> Sent: den 20 september 2010 11:42 >> To: mathgroup at smc.vnet.net >> Subject: [mg112561] Re: Root Finding Methods Gaurenteed to Find All Root > Between >> (xmin, xmax) >> >> The method that I referred to does have a practical limitation in that >> it relies on an implementation of interval arithmetic or some equivalent >> method of finding upper bounds of the function(s) involved in our >> equations and their derivatives (or generalised jacobians in the >> multi-dimensional case). In other words, method will work as long as we >> can compute f[Interval{{a,b}]], f'[Interval[{a,b}]] etc. >> >> The algorithm works as follows. It subdivides the interval (or >> rectangle) in which we are looking for roots into a finite number of >> sub-rectangles. It them performs two tests on them, let's call them T1 >> and T2. If a rectangle passes the testT1 we no for sure that there are >> no roots in it and we eliminate it from further search. If a rectangle >> fails test T1 we apply to it test T2. If the rectangle passes T2 we know >> with certainty that there is at most a single root in the rectangle. We >> then apply Newton's method to the rectangle - it will either find the >> required root or will not converge. This will always determine whether >> there is a root in the rectangle or not. If a rectangle fails both tests >> it is subdivided further into smaller and smaller rectangles. As there >> are only finitely many roots in the original rectangle they will all be >> separated and we will now for sure that we have found all the roots. >> Since the process goes on until all roots have been found the example >> described below does not contradict the method. >> >> A problem occurs when there is a multiple root. A rectangle containing >> such a root will always fail both tests and the algorithm will never >> complete. We can of course always decide to stop search when the >> rectangle containing possibly multiple roots is sufficiently small. We >> will then know that there is possibly a multiple root in the remaining >> rectangle. If we wish we can of course let the method run longer. If the >> roots are really distinct the process will stop for sure, but if they >> are multiple it can go on for ever. >> >> Here may be the right place to add that Daniel Lichtblau has noticed >> that the method given by Semenov in the case systems with 2 or more >> equations and variables contains a mistake (it is fine for the one >> dimensional case). There is a mistake in the application of the mean >> value theorem for vector valued functions in Semenev's published >> article, which causes the algorithm to miss certain roots. However, we >> have been able to find the correct version of Semenev's method based on >> the concept of generalized jacobian. I am now writing up our corrected >> version of Semenev's method and will soon implement it in Mathematica. >> Eventually I will replace my demonstrations on the Demosntrations site >> with new ones using the new version, which we believe to be correct. >> >> Andrzej Kozlowski >> >> >> >> On 19 Sep 2010, at 16:50, Ted Ersek wrote: >> >>> In [1] I gave a long response to the thread [FindRoots] >>> >>> It seems in [2] Andrzej Kozlowski was saying there are ways to >>> obtain the complete set of roots (provably complete) over a >>> closed and bounded set provided the function is C2 continuous >>> and does not have multiple roots in the domain of definition. >>> >>> I suppose he is talking about numeric methods to find all the roots. >>> If that is the case I would expect these methods to have practical >>> limitations in what they can do? I mean a function could be very >>> ill-behaved and still meet the conditions above. Consider f[x] below. >>> >>> pnts==Table[ {x,Log[x]}, {x,3.0,250.0,0.01}]; >>> Part[ pnts, 16000, 2] == -0.05; >>> f==Interpolation[ pnts, Method:>Spline ]; >>> >>> ==46rom the output of Plot[f[x], {x,3,250}] it looks like f[x] is > monotonic, >>> well behaved and having no roots between (3, 250). In fact it is >>> C2 continuous, but it does have two root close to 18.99. It actually i= s >>> well behaved except for the small interval between (18.992, 19.006). >>> >>> Suppose we wanted to find all the roots between (3,250) of a function >>> defined as SomeNumericalAlgorithm[x] and the function always evaluated > to >>> the same value as the InterpolatingFunction above. How can a numeric >>> RootFinding >>> algorithm be gaurenteed to find these roots near 18.99 if it doesn't > have >>> the >>> good furtune of taking one or more sample between (18.992, 19.006), or >>> sampling higher order derivatives in a interval a bit larger than > (18.992, >>> 19.006)? >>> >>> (******* References ********) >>> >>> [1] http://forums.wolfram.com/mathgroup/archive/2010/Sep/msg00255.htm= l >>> >>> [2] == >> http://forums.wolfram.com/mathgroup/archive/2010/Sep/msg00262.html >>> >>> > >