Re: Compiling Runge-kutta
- To: mathgroup at smc.vnet.net
- Subject: [mg125661] Re: Compiling Runge-kutta
- From: João Paulo Pereira <joaopereira9 at gmail.com>
- Date: Mon, 26 Mar 2012 01:48:02 -0500 (EST)
- Delivered-to: l-mathgroup@mail-archive0.wolfram.com
- References: <201203200721.CAA11925@smc.vnet.net>
Dear Mathgroup I have a question regarding a problem I'm facing. Any help would be much appreciated. You may find the full description below Thanks in advance Joao Pereira ---------- Forwarded message ---------- From: Oliver Ruebenkoenig <ruebenko at wolfram.com> Date: 2012/3/25 Subject: [mg125661] Re: Compiling Runge-kutta To: Jo=E3o Paulo Pereira <joaopereira9 at gmail.com> Hi Joao, perhaps it were good if posted this message to MathGroup as well, it does not look like you did that. I will not have time to look into this immediately and other readers of MG might find it useful too. A first thought: Have a look at your complied function with CompilePrint and see if you can spot something that gives a hint on why the compiled function might not give the expected performance (e.g. call to MainEvaluate in a loop) Oliver On Sat, 24 Mar 2012, Jo=E3o Paulo Pereira wrote: Hi Oliver, you're right of course. > This is the part of my code relevant to this issue: > > (*These are global variables used throughout the program*) > > (*GLOBAL VARIABLES*) > > gbvm[t_, i_] = (2.2946301434601255*^-8 + > 0.000016958949537576912* Exp[0.06497640748638328*i]); > gbv\[Sigma][t_] = 0.8 ; > gbv\[Alpha][t_] = 7.0; > gbv\[Epsilon][t_] = 0.8; > gbvr[t_] = 0.03 ; > gbv\[Rho][t_, i_] = 0.019 ; > gbva[t_] = 0.65 ; > gbv\[Phi][t_] = 0.3; > gbv\[Delta][t_] = 0.04 ; > gbvs[t_] = 15.0 ; > gbvref[t_] = 65.0 ; > gbv\[Gamma][t_] = 0.1; > gbvz0[t_] = 0.5; > gbvzl[t_] = 0.49346 ; > gbvzp[t_] = 0.10794 ; > gbv\[Beta][t_] = 0.511459; > gbvdisc\[Rho][v_, t0_, t_] := > Exp[NIntegrate[ > gbv\[Rho][t0, xt - v] + gbvm[t0, xt - v], {xt, t0, t}]]; > gbvdiscr[v_, t0_, t_] := > Exp[Integrate[gbvr[t0] + gbvm[t0, xt - v], {xt, t0, t}]]; > gbvdiscrtr[v_, t0_, tr_, t_] := > Exp[Integrate[gbvr[t0] + gbvm[t0, xt - v], {xt, tr, t}]]; > > (* ------------------------------**-------------------*) > > (*CONTROL VARIBLES THAT FEED INTO THE ODE SYSTEM*) > > controlos1[t_, x_] := Module[{xeq1, dxeq1, xsl1, s}, > xeq1[s_] = > gbv\[Alpha][ > t0]/(gbvdisc\[Rho][v, t0, t]* > Power[s, gbv\[Epsilon][t0]]) - (x[[4]]*gbva[t0]*e[[1]]* > gbv\[Phi][t0])/(Power[1.0 - s, 1.0 - gbv\[Phi][t0]]) - > x[[6]] gbv\[Gamma][t0] e[[2]]; > dxeq1[s_] = -gbv\[Epsilon][t0] gbv\[Alpha][ > t0]/(gbvdisc\[Rho][v, t0, t]* > Power[s, gbv\[Epsilon][t0] + 1.0]) - (x[[4]]*gbva[t0]*e[[1]]* > gbv\[Phi][t0]*(1.0 - gbv\[Phi][t0]))/ > Power[1.0 - s, 2.0 - gbv\[Phi][t0]]; > xsl1 = newton[xeq1, dxeq1, 0.3, 10^(-10), 20, {0.0, 1.0}]; > {Power[x[[6]]*gbvdisc\[Rho][**v, t0, t], -1.0/gbv\[Sigma][t0]], xsl1, > 1.0 - xsl1} > ]; > > controlos2[t_, x_] := Module[{xd, xc, xeq2, dxeq2, s, xsl2}, > xd = (x[[5]] + (1.0 - gbvzl[t0]) x[[6]])*e[[2]]*x[[1]]; > xc = { > Power[x[[6]]*gbvdisc\[Rho][v, t0, t], -1.0/gbv\[Sigma][t0]], > Power[gbv\[Alpha][t0]/(xd***gbvdisc\[Rho][v, t0, t]), > 1.0/gbv\[Epsilon][t0]], > Power[x[[4]]*gbva[t0]*e[[1]]***gbv\[Phi][t0]/xd, > 1.0/(1.0 - gbv\[Phi][t0])] > }; > If[x[[4]] <= 0.0, {xc[[1]], 1.0, 0.0}, > If[xc[[2]] + xc[[3]] > 1.0, > xeq2[s_] = > gbv\[Alpha][ > t0]/(gbvdisc\[Rho][v, t0, t]* > Power[s, gbv\[Epsilon][t0]]) - (x[[4]]*gbva[t0]*e[[1]]* > gbv\[Phi][t0])/Power[1.0 - s, 1.0 - gbv\[Phi][t0]]; > dxeq2[ > s_] = -gbv\[Epsilon][t0] gbv\[Alpha][ > t0]/(gbvdisc\[Rho][v, t0, t]* > Power[s, gbv\[Epsilon][t0] + 1.0]) - (x[[4]]*gbva[t0]* > e[[1]]*gbv\[Phi][t0]*(1.0 - gbv\[Phi][t0]))/ > Power[1.0 - s, 2.0 - gbv\[Phi][t0]]; > xsl2 = newton[xeq2, dxeq2, 0.3, 10^(-10), 20, {0.0, 1.0}]; > {xc[[1]], xsl2, 1.0 - xsl2}, xc] > ] > ]; > > (*THE SYSTEM OF ODE TO BE SOLVED *) > (*The same system is divided into 2 stages with diferent functional forms= *) > > prehamiltonianfield1[t_, x_] := { > gbva[t0]*e[[1]]*Power[**controlos1[t, x][[3]], gbv\[Phi][t0]] - > gbv\[Delta][t0]*x[[1]], > 0.0, > (gbvr[t0] + gbvm[t0, t - v])*x[[3]] - controlos1[t, x][[1]] - > gbvz0[t0] + gbv\[Gamma][t0]*controlos1[t, x][[3]]*e[[2]], > gbv\[Delta][t0]*x[[4]], > 0.0, > -(gbvr[t0] + gbvm[t0, t - v])*x[[6]] > }; > > prehamiltonianfield2[t_, x_] := { > gbva[t0]*e[[1]]*Power[**controlos2[t, x][[3]], gbv\[Phi][t0]] - > gbv\[Delta][t0]*x[[1]], > x[[1]]*e[[2]]*(1.0 - controlos2[t, x][[2]] - controlos2[t, x][[3]]), > (gbvr[t0] + gbvm[t0, t - v])*x[[3]] - controlos2[t, x][[1]] - > gbvz0[t0] + (1.0 - gbvzl[t0])*x[[1]]* > e[[2]]*(1.0 - controlos2[t, x][[2]] - controlos2[t, x][[3]]), > gbv\[Delta][t0]*x[[4]] - (x[[5]] + (1.0 - gbvzl[t0]) x[[6]])* > e[[2]]*(1.0 - controlos2[t, x][[2]] - controlos2[t, x][[3]]), > 0.0, > -(gbvr[t0] + gbvm[t0, t - v])*x[[6]] > }; > > (* ------------------------------**-----------*) > > (*ALGORITHMS THAT WILL BE USED*) > > (*NEWTON R1 ADAPTED TO SEARCH WITHIN A DOMAIN*) > > newton[f_, df_, p0_, \[CapitalDelta]_, max_, dom_] := > Module[{k, dz, z0, z1}, > Clear[k, dz, z0, z1]; > k = 0; > dz = 1.0; > z0 = N[p0]; > While[And[k < max, N[\[CapitalDelta]] < Abs[dz]], > dz = -(Apply[f, {z0}]/Apply[df, {z0}]); > z1 = z0 + dz; > If[z1 <= dom[[1]], z1 = z0 + 0.95*(dom[[1]] - z0), > If[z1 >= dom[[2]], z1 = z0 + 0.95*(dom[[2]] - z0), z1 = z0 + d= z]]; > k = k + 1; > z0 = z1;]; N[z1]]; > > (*EXPLICIT RK-4*) > > rk4[f_, t0_, tf_, x0_, n_] := Module[{h, step}, h = (tf - t0)/n; > step[{t_, x_}] := Module[{k1, k2, k3, k4}, k1 = h*f[t, x]; > k2 = h*f[t + 0.5 h, x + 0.5 k1]; > k3 = h*f[t + 0.5 h, x + 0.5 k2]; > k4 = h*f[t + h, x + k3]; > {t + h, x + 1/6*(k1 + 2 k2 + 2 k3 + k4)}]; > NestList[step, {t0, x0}, n]]; > > (*EXPLICIT RK-4 WITH COMPILE*) > makeCompRK[fIn_] := With[{f = fIn}, > Compile[{{x0, _Real, 1}, {t0, _Real}, {tMax, _Real}, {n, _Integer}}, > Module[{k, h, K1, K2, K3, K4, SolList, x = x0}, > h = (tMax - t0)/n; > SolList = Table[{0., 0.}, {n + 1}]; > SolList[[1]] = x0; > Do[t = t0 + k h; > K1 = h f[t, x]; > K2 = h f[t + (0.5) h, x + (0.5) K1]; > K3 = h f[t + (0.5) h, x + (0.5) K2]; > K4 = h f[t + h, x + K3]; > x = x + (1/6) K1 + (1/3) K2 + (1/3) K3 + (1/6) K4; > SolList[[k]] = x;, {k, 2, n + 1}]; > SolList]]]; > > (* ------------------------------**--------*) > > (*I USE THIS ROUTIN TO CALL THE RK4, BECAUSE I NEED TO INPUT THE DATA \ > IN A SPECIAL WAY*) > > consProg[f_, vp_, t0p_, ep_, x0s_, var_, n_] := Module[{ts, x0, tf}, > Clear[v, e, ts, tf]; > v = vp; ts = t0p; e = ep; > x0 = Join[x0s, var]; > tf = x0[[7]]; > rk4[f, ts, tf, x0[[1 ;; 6]], n] > ]; > > (*I USE THIS ROUTINE TO COMPUTE 2 STAGES OF THE LIFE-CYCLE*) > > consumerTR[f1_, f2_, vp_, t0p_, ep_, x0s_, var_, n1_, n2_] := > Module[{x0, tf, var1, x0s2, var2, phase1, phase2}, > Clear[v, t0, e, tf]; > v = vp; t0 = t0p; e = ep; > tf = var[[4]]; > var1 = Append[var[[1 ;; 3]], gbvs[t0]]; > If[tf < gbvs[t0], Abort[], > If[t0 < gbvs[t0], > phase1 = consProg[f1, v, t0, e, x0s, var1, n1]; > x0s2 = phase1[[n1 + 1]][[2, 1 ;; 3]]; > var2 = Append[phase1[[n1 + 1]][[2, 4 ;; 6]], tf]; > phase2 = > consProg[f2, v, phase1[[n1 + 1]][[1]], e, x0s2, var2, n2]; > Join[phase1, Drop[phase2, 1]], > consProg[f2, v, t0, x0s, var, n2], > Print["non evaluated inner If"]], Print["non evaluated outer If"]] > ]; > > (*NOW THE COMPILED VERSIONS of the last two routines) > > consProgCompile[f_, vp_, t0p_, ep_, x0s_, var_, n_] := > Module[{ts, x0, tf}, > Clear[v, e, ts, tf]; > v = vp; ts = t0p; e = ep; > x0 = Join[x0s, var]; > tf = x0[[7]]; > rk4[f, ts, tf, x0[[1 ;; 6]], n] > ]; > > consumerTRcompile[f1_, f2_, vp_, t0p_, ep_, x0s_, var_, n1_, n2_] := > Module[{x0, tf, var1, x0s2, var2, phase1, phase2}, > Clear[v, t0, e, tf]; > v = vp; t0 = t0p; e = ep; > tf = var[[4]]; > var1 = Append[var[[1 ;; 3]], gbvs[t0]]; > If[tf < gbvs[t0], Abort[], > If[t0 < gbvs[t0], > phase1 = consProgCompile[f1, v, t0, e, x0s, var1, n1]; > x0s2 = phase1[[n1 + 1]][[2, 1 ;; 3]]; > var2 = Append[phase1[[n1 + 1]][[2, 4 ;; 6]], tf]; > phase2 = > consProgCompile[f2, v, phase1[[n1 + 1]][[1]], e, x0s2, var2, n2]; > Join[phase1, Drop[phase2, 1]], > consProgCompile[f2, v, t0, x0s, var, n2], > Print["non evaluated inner If"]], Print["non evaluated outer If"]] > ]; > > (*----------------------------***) > (*RESULTS*) > > (*for this simple example...*) > f = Function[{t, x}, {x[[2]], t*1 - x[[1]] + 4/(x[[1]]^3)}]; > RK4Comp = makeCompRK[f]; > > (*both forms of the algorithm work, although the compiled is much \ > faster*) > > RK4Comp[{1.0, 1.0}, 0.0, 100.0, 50000]; // AbsoluteTiming > ={2.7456000, Null} > > rk4[f, 0.0, 100.0, {1.0, 1.0}, 50000]; // AbsoluteTiming > = {5.8900000, Null} > > but for the problem I am working on we get > > consumerTR[**prehamiltonianfield1, prehamiltonianfield2, 0.0, > 0.0, {1.8236, 0.95}, {0., 0., 0.}, {2.3, 0.12, 3.0, 65.}, 15, > 50]; // AbsoluteTiming > ={9.9698000, Null} > consumerTRcompile[**prehamiltonianfield1, prehamiltonianfield2, 0.0, > 0.0, {1.8236, 0.95}, {0., 0., 0.}, {2.3, 0.12, 3.0, 65.}, 15, > 50]; // AbsoluteTiming > ={9.8652000, Null} > > The execution time is similar which I guess is because there is some > problem > with the compiled version and it reverts to the uncomplied version. > > You've probably noticed that I am feeding the initial conditions in a wei= rd > way, instead of having a list of 6 variables, I have one with three and > another with 4, this last one has the remaining 3 variables and the > terminal > time. I am doing this because the value of the variables in the second li= st > (including the terminal time) are unknown and have to be such that they > solve a set of restrictions at terminal time. To do this I apply a newton > algorithm to find the zeros of that boundary condition which gives me the= se > initial conditions and the true terminal time. This implies that the > runge-kutta will be called many times wihile the newton algorithm is > running. So speed is really an importante issue here, also because I am > running this problem for a population and it will have to be run for all > generations of that population (different initial v - generation) and eve= ry > single generation will repeat this procedure for a given moment in > time....if one adds the fact that I will be following this population > through time, it becomes clear that the runge kutta algorithm will be use= d > thousands of times > > Since the right hand side of the ODE system contains switchs and implicit > functions that call a newton algorithm > is it still feasible to compile the rk algorithm? > > Thanks again in advance > > My best regards > > Joao Pereira > > > > > > > > > > > > > > > > On Fri, Mar 23, 2012 at 6:32 AM, Oliver Ruebenkoenig <ruebenko at wolfram.co= m > > > wrote: > > Joao, > > it's hard to say what the issue is without seeing the code. I > suspect you > get more answers if you actually post your code. Please make > sure it is > complete and easy to copy and past. > > Oliver > > > > On Tue, 20 Mar 2012, Joao wrote: > > > Hi there, I've been going through this thread (Vector Runge-Kutta > ODE solver with compilation?) and I would like to know if anyone could > give me an opinion on the problem I have. > > > > Basically I am using the same runge-kutta 4 taken from Jason > Cantarella's page than Dmitry was, but I am using the NestList > version. > > > > I really need to speed up this algorithm. When I did one of the > examples you provided on this thread, the compiled version solved it > at a duration of 7% of the non compiled version which was pretty cool. > > > > My problem was that when I tried to use it on the work I'm doing the > compiled and non compiled versions took the same time. My guess is > that the compiled version wasn't able to compute and it returned to > the non-compiled version. > > > > At this moment I don't know if I'm doing something wrong or if the > nature of the RHS of the functions I'm using prevents the compiled > algorithm to work in a proper way. > > > > I can send the code if necessary. > > > > Nevertheless, I may advance that my RHS is a system of 6 ODE > (hamiltonian), these ODE call other functions (controls). These last > functions (controls) are coded in a module box. Inside that box are > some implicit functions in R1 (hence a newton algorithm is called) and > there is also a switch - if a given condition is above a threshold the > control box assumes one form, bellow that threshold another form is > assumed. After all these calculations inside the control box it > returns simply a vector of order 3. > > > > I would appreciate any help. At the moment I do not know if this is > feasible with Compile or not. Maybe I have to put all this different > boxes inside a compile? > > > > Thanks in advance > > > > Joao Pereira > > > > > > > >
- References:
- Compiling Runge-kutta
- From: Joao <joaopereira9@gmail.com>
- Compiling Runge-kutta