Re: Optimizing FindMinimum for speed using intermediate calculations.

*To*: mathgroup at christensen.cybernetics.net*Subject*: [mg1247] Re: Optimizing FindMinimum for speed using intermediate calculations.*From*: tomi (Tom Issaevitch)*Date*: Fri, 26 May 1995 06:40:22 -0400*Organization*: Wolfram Research, Inc.

From: Paul A. Rubin, rubin at msu.edu Date: 25 May 1995 05:31:41 GMT In article <3q14nt$725 at news0.cybernetics.net> Paul A. Rubin, rubin at msu.edu writes: >In article <3pjl4l$7h0 at news0.cybernetics.net>, > Bill Harbaugh <harbaugh at students.wisc.edu> wrote: >->Hello >-> >->I am trying to maximize a likelihood function, using FindMinimum. This >->involves finding the parameters (x,y in my simplified example) that max >->(min, in my example) the sum of the values of a function applied to some >->data (ddist, in my example). >-> >->My actual problem is very long and my question is about how to optimize >->my mma code for speed. >-> >->This function involves some intermediate results (x1,y1 in my example) >->which vary across the x,y parameters but are constant across the d's. >->(In my actual problem, these intermediate results are found using NSolve, >->so I am specifying the functions using := rather than =). I am trying to >->only calculate these once for each iteration on x,y, and then apply the >->results to each of the d's in the list ddist. The code below shows how I >->do this without trying to optimize. >-> >-> >->ddist={1,2,3,4,5,6,7,8,9,10} >->x1:=x+1 >->y1:=y+2 >->l[d_]:=(x1*d)^2+(y1*d)^2 >-> >->f=Apply[Plus, >-> Map[l,ddist] >-> ] >-> >->FindMinimum[f,{x,{-1,1}},{y,{-1,1}}] >-> >-> >-> >-> >->This works, but it seems the following should be faster. >-> >->Clear[l]; >->Clear[f]; >->l[d_]:=(x1*d)^2+(y1*d)^2 >-> >->f={x1=x+1,y1=y+2,Apply[Plus, >-> Map[l,ddist] >-> ]} >-> >->FindMinimum[f[[3]],{x,{-1,1}},{y,{-1,1}}] >-> >->This also works, but timing results are inconclusive. I would appreciate >->suggestions on any other approaches that might speed this up. I am also >->asking mma support for help, and will forward their reply. >-> >->Bill Harbaugh >->harbaugh at students.wisc.edu >-> >I don't see why the second version should be faster. In both cases, you >compute an expression (f in the first case, f[[3]] in the second) that is >quadratic in x and y, and feed that to FindMinimum. The only difference in >the two calls to FindMinimum is that the second forces FindMinimum to >extract the third entry in a list (f), which would slow it down a tad. As >to the steps leading up to FindMinimum, I don't see any meaningful >difference. > >I suspect, though, that your example above does not illustrate your actual >problem. In both versions of the example, you end up with an objective >function directly computable from x and y. It sounds as though, in your >actual problem, there is an intermediate step (getting x1, y1 from x and y) >that cannot be written in closed form. That being the case, the question >is: where are your CPU cycles going? Is the time consuming part getting >from {x, y} to {x1, y1}, or getting from {x1, y1} to f (and its >derivatives)? Or is it the sheer number of iterations you're doing? > >Off the cuff, the only suggestion I can give you for speeding up your >problem (and it's a shot in the dark) would be to start with a subset of >the data, solve using just the subset (which should go faster), then add >back some of the deleted observations and solve again, using the previous >solution as the starting values for FindMinimum. That way, the early >iterations (getting you into the general vicinity of the optimum) are done >at lower computational cost. You could also start with a fairly sloppy >precision goal and/or iteration limit, and get pickier as you add points >(and, hopefully, get closer to the optimum). > >Unfortunately, FindMinimum does not give you a list of intermediate results >for x and y. Steepest descent has been known to zigzag its way toward an >optimum. If you knew that was happening, you could switch to a fancier >optimization method (something based on conjugate gradients or quasi-Newton >methods) and perhaps get faster convergence. Then again, you'd have to >program it yourself. > >Paul > > >************************************************************************** >* Paul A. Rubin Phone: (517) 432-3509 * >* Department of Management Fax: (517) 432-1111 * >* Eli Broad Graduate School of Management Net: RUBIN at MSU.EDU * >* Michigan State University * >* East Lansing, MI 48824-1122 (USA) * >************************************************************************** >Mathematicians are like Frenchmen: whenever you say something to them, >they translate it into their own language, and at once it is something >entirely different. J. W. v. GOETHE > > > Just to clarify, FindMinimum *does* use conjugate gradient decent. This is improperly documented. Further, as shown below, you can have FindMinimum print out the argument values as it proceeds. One way to speed it up is to use Compile --- compile both the function and the gradient and pass the latter as an option to FindMinimum. Here is an example (in the example, Compile resulted in only a slight improvement) In[30]:= h[u_, v_, x_, y_, z_]:= x^2 u^2 + (y-x+1)^4 (z-3)^2+(u^2-v)^2(x^2-1)^2 In[31]:= Timing[FindMinimum[ Print[{u,v,x,y,z, a=h[u,v,x,y,z]}];a, {u, 1}, {v, 1}, {x, 1}, {y, 1}, {z, 1}]] (* I have suppressed the output from Print. You can have any compound statement as the first argument. This also works in NIntegrate. *) Out[31]= -6 {0.8 Second, {4.44451 10 , {u -> 0.000517278, v -> 0.970884, x -> 1.00019, y -> 0.0332128, z -> 1.15682}}} In[32]:= g = Compile[{u,v,x,y,z}, Evaluate[h[u,v,x,y,z]]] Out[32]= 2 2 2 2 2 2 CompiledFunction[{u, v, x, y, z}, u x + (u - v) (-1 + x ) + 4 2 (1 - x + y) (-3 + z) , -CompiledCode-] In[33]:= ({g1,g2,g3,g4,g5} = Compile[{u,v,x,y,z}, Evaluate[D[h[u,v,x,y,z], #]]]& /@ {u,v,x,y,z}); In[34]:= Timing[FindMinimum[ g[u,v, x,y,z], {u,1},{v,1}, {x, 1}, {y, 1}, {z, 1}, Gradient -> #]& @@ {{g1[u,v,x,y,z],g2[u,v,x,y,z],g3[u,v,x,y,z],g4[u,v,x,y,z],g5[u,v,x,y,z]}}] (* Harmless Compile error messages suppressed *) Out[34]= -6 {0.683333 Second, {4.44451 10 , {u -> 0.000517278, v -> 0.970884, x -> 1.00019, y -> 0.0332128, z -> 1.15682}}} Tom Wolfram Research