MathGroup Archive: March 2012 [00439]

[Date Index] [Thread Index] [Author Index]
Re: Compiling Runge-kutta
To: mathgroup at smc.vnet.net
Subject: [mg125661] Re: Compiling Runge-kutta
From: João Paulo Pereira <joaopereira9 at gmail.com>
Date: Mon, 26 Mar 2012 01:48:02 -0500 (EST)
Delivered-to: l-mathgroup@mail-archive0.wolfram.com
References: <201203200721.CAA11925@smc.vnet.net>
Dear Mathgroup

I have a question regarding a problem I'm facing. Any help would be much
appreciated. You may find the full description below

Thanks in advance

Joao Pereira



---------- Forwarded message ----------
From: Oliver Ruebenkoenig <ruebenko at wolfram.com>
Date: 2012/3/25
Subject: [mg125661] Re: Compiling Runge-kutta
To: Jo=E3o Paulo Pereira <joaopereira9 at gmail.com>



Hi Joao,

perhaps it were good if posted this message to MathGroup as well, it does
not look like you did that. I will not have time to look into this
immediately and other readers of MG might find it useful too.

A first thought: Have a look at your complied function with CompilePrint
and see if you can spot something that gives a hint on why the compiled
function might not give the expected performance (e.g. call to MainEvaluate
in a loop)

Oliver


On Sat, 24 Mar 2012, Jo=E3o Paulo Pereira wrote:

 Hi Oliver, you're right of course.
> This is the part of my code relevant to this issue:
>
> (*These are global variables used throughout the program*)
>
> (*GLOBAL VARIABLES*)
>
> gbvm[t_, i_] = (2.2946301434601255*^-8 +
>     0.000016958949537576912* Exp[0.06497640748638328*i]);
> gbv\[Sigma][t_] = 0.8 ;
> gbv\[Alpha][t_] = 7.0;
> gbv\[Epsilon][t_] = 0.8;
> gbvr[t_] = 0.03 ;
> gbv\[Rho][t_, i_] = 0.019 ;
> gbva[t_] = 0.65 ;
> gbv\[Phi][t_] = 0.3;
> gbv\[Delta][t_] = 0.04 ;
> gbvs[t_] = 15.0  ;
> gbvref[t_] = 65.0 ;
> gbv\[Gamma][t_] = 0.1;
> gbvz0[t_] = 0.5;
> gbvzl[t_] = 0.49346 ;
> gbvzp[t_] = 0.10794 ;
> gbv\[Beta][t_] = 0.511459;
> gbvdisc\[Rho][v_, t0_, t_] :=
>   Exp[NIntegrate[
>     gbv\[Rho][t0, xt - v] + gbvm[t0, xt - v], {xt, t0, t}]];
> gbvdiscr[v_, t0_, t_] :=
>   Exp[Integrate[gbvr[t0] + gbvm[t0, xt - v], {xt, t0, t}]];
> gbvdiscrtr[v_, t0_, tr_, t_] :=
>   Exp[Integrate[gbvr[t0] + gbvm[t0, xt - v], {xt, tr, t}]];
>
> (* ------------------------------**-------------------*)
>
> (*CONTROL VARIBLES THAT FEED INTO THE ODE SYSTEM*)
>
> controlos1[t_, x_] := Module[{xeq1, dxeq1, xsl1, s},
>    xeq1[s_] =
>     gbv\[Alpha][
>        t0]/(gbvdisc\[Rho][v, t0, t]*
>         Power[s, gbv\[Epsilon][t0]]) - (x[[4]]*gbva[t0]*e[[1]]*
>         gbv\[Phi][t0])/(Power[1.0 - s, 1.0 - gbv\[Phi][t0]]) -
>      x[[6]] gbv\[Gamma][t0] e[[2]];
>    dxeq1[s_] = -gbv\[Epsilon][t0] gbv\[Alpha][
>         t0]/(gbvdisc\[Rho][v, t0, t]*
>          Power[s, gbv\[Epsilon][t0] + 1.0]) - (x[[4]]*gbva[t0]*e[[1]]*
>         gbv\[Phi][t0]*(1.0 - gbv\[Phi][t0]))/
>       Power[1.0 - s, 2.0 - gbv\[Phi][t0]];
>    xsl1 = newton[xeq1, dxeq1, 0.3, 10^(-10), 20, {0.0, 1.0}];
>    {Power[x[[6]]*gbvdisc\[Rho][**v, t0, t], -1.0/gbv\[Sigma][t0]], xsl1,
>      1.0 - xsl1}
>    ];
>
> controlos2[t_, x_] := Module[{xd, xc, xeq2, dxeq2, s, xsl2},
>    xd = (x[[5]] + (1.0 - gbvzl[t0]) x[[6]])*e[[2]]*x[[1]];
>    xc = {
>      Power[x[[6]]*gbvdisc\[Rho][v, t0, t], -1.0/gbv\[Sigma][t0]],
>      Power[gbv\[Alpha][t0]/(xd***gbvdisc\[Rho][v, t0, t]),
>       1.0/gbv\[Epsilon][t0]],
>      Power[x[[4]]*gbva[t0]*e[[1]]***gbv\[Phi][t0]/xd,
>       1.0/(1.0 - gbv\[Phi][t0])]
>      };
>    If[x[[4]] <= 0.0, {xc[[1]], 1.0, 0.0},
>     If[xc[[2]] + xc[[3]] > 1.0,
>       xeq2[s_] =
>       gbv\[Alpha][
>          t0]/(gbvdisc\[Rho][v, t0, t]*
>           Power[s, gbv\[Epsilon][t0]]) - (x[[4]]*gbva[t0]*e[[1]]*
>           gbv\[Phi][t0])/Power[1.0 - s, 1.0 - gbv\[Phi][t0]];
>       dxeq2[
>        s_] = -gbv\[Epsilon][t0] gbv\[Alpha][
>           t0]/(gbvdisc\[Rho][v, t0, t]*
>            Power[s, gbv\[Epsilon][t0] + 1.0]) - (x[[4]]*gbva[t0]*
>           e[[1]]*gbv\[Phi][t0]*(1.0 - gbv\[Phi][t0]))/
>         Power[1.0 - s, 2.0 - gbv\[Phi][t0]];
>       xsl2 = newton[xeq2, dxeq2, 0.3, 10^(-10), 20, {0.0, 1.0}];
>      {xc[[1]], xsl2, 1.0 - xsl2}, xc]
>     ]
>    ];
>
> (*THE SYSTEM OF ODE TO BE SOLVED *)
> (*The same system is divided into 2 stages with diferent functional forms=
*)
>
> prehamiltonianfield1[t_, x_] := {
>    gbva[t0]*e[[1]]*Power[**controlos1[t, x][[3]], gbv\[Phi][t0]] -
>     gbv\[Delta][t0]*x[[1]],
>    0.0,
>    (gbvr[t0] + gbvm[t0, t - v])*x[[3]] - controlos1[t, x][[1]] -
>     gbvz0[t0] + gbv\[Gamma][t0]*controlos1[t, x][[3]]*e[[2]],
>    gbv\[Delta][t0]*x[[4]],
>    0.0,
>    -(gbvr[t0] + gbvm[t0, t - v])*x[[6]]
>    };
>
> prehamiltonianfield2[t_, x_] := {
>    gbva[t0]*e[[1]]*Power[**controlos2[t, x][[3]], gbv\[Phi][t0]] -
>     gbv\[Delta][t0]*x[[1]],
>    x[[1]]*e[[2]]*(1.0 - controlos2[t, x][[2]] - controlos2[t, x][[3]]),
>    (gbvr[t0] + gbvm[t0, t - v])*x[[3]] - controlos2[t, x][[1]] -
>     gbvz0[t0] + (1.0 - gbvzl[t0])*x[[1]]*
>      e[[2]]*(1.0 - controlos2[t, x][[2]] - controlos2[t, x][[3]]),
>    gbv\[Delta][t0]*x[[4]] - (x[[5]] + (1.0 - gbvzl[t0]) x[[6]])*
>      e[[2]]*(1.0 - controlos2[t, x][[2]] - controlos2[t, x][[3]]),
>    0.0,
>    -(gbvr[t0] + gbvm[t0, t - v])*x[[6]]
>    };
>
> (* ------------------------------**-----------*)
>
> (*ALGORITHMS THAT WILL BE USED*)
>
> (*NEWTON R1 ADAPTED TO SEARCH WITHIN A DOMAIN*)
>
> newton[f_, df_, p0_, \[CapitalDelta]_, max_, dom_] :=
>   Module[{k, dz, z0, z1},
>    Clear[k, dz, z0, z1];
>    k = 0;
>    dz = 1.0;
>    z0 = N[p0];
>    While[And[k < max, N[\[CapitalDelta]] < Abs[dz]],
>     dz = -(Apply[f, {z0}]/Apply[df, {z0}]);
>     z1 = z0 + dz;
>     If[z1 <= dom[[1]], z1 = z0 + 0.95*(dom[[1]] - z0),
>      If[z1 >= dom[[2]], z1 = z0 + 0.95*(dom[[2]] - z0), z1 = z0 + d=
z]];
>     k = k + 1;
>     z0 = z1;]; N[z1]];
>
> (*EXPLICIT RK-4*)
>
> rk4[f_, t0_, tf_, x0_, n_] := Module[{h, step}, h = (tf - t0)/n;
>    step[{t_, x_}] := Module[{k1, k2, k3, k4}, k1 = h*f[t, x];
>      k2 = h*f[t + 0.5 h, x + 0.5 k1];
>      k3 = h*f[t + 0.5 h, x + 0.5 k2];
>      k4 = h*f[t + h, x + k3];
>      {t + h, x + 1/6*(k1 + 2 k2 + 2 k3 + k4)}];
>    NestList[step, {t0, x0}, n]];
>
> (*EXPLICIT RK-4 WITH COMPILE*)
> makeCompRK[fIn_] := With[{f = fIn},
>    Compile[{{x0, _Real, 1}, {t0, _Real}, {tMax, _Real}, {n, _Integer}},
>     Module[{k, h, K1, K2, K3, K4, SolList, x = x0},
>      h = (tMax - t0)/n;
>      SolList = Table[{0., 0.}, {n + 1}];
>      SolList[[1]] = x0;
>      Do[t = t0 + k h;
>       K1 = h f[t, x];
>       K2 = h f[t + (0.5) h, x + (0.5) K1];
>       K3 = h f[t + (0.5) h, x + (0.5) K2];
>       K4 = h f[t + h, x + K3];
>       x = x + (1/6) K1 + (1/3) K2 + (1/3) K3 + (1/6) K4;
>       SolList[[k]] = x;, {k, 2, n + 1}];
>      SolList]]];
>
> (* ------------------------------**--------*)
>
> (*I USE THIS ROUTIN TO CALL THE RK4, BECAUSE I NEED TO INPUT THE DATA \
> IN A SPECIAL WAY*)
>
> consProg[f_, vp_, t0p_, ep_, x0s_, var_, n_] := Module[{ts, x0, tf},
>    Clear[v, e, ts, tf];
>    v = vp; ts = t0p; e = ep;
>    x0 = Join[x0s, var];
>    tf = x0[[7]];
>    rk4[f, ts, tf, x0[[1 ;; 6]], n]
>    ];
>
> (*I USE THIS ROUTINE TO COMPUTE 2 STAGES OF THE LIFE-CYCLE*)
>
> consumerTR[f1_, f2_, vp_, t0p_, ep_, x0s_, var_, n1_, n2_] :=
>   Module[{x0, tf, var1, x0s2, var2, phase1, phase2},
>    Clear[v, t0, e, tf];
>    v = vp; t0 = t0p; e = ep;
>    tf = var[[4]];
>    var1 = Append[var[[1 ;; 3]], gbvs[t0]];
>    If[tf < gbvs[t0], Abort[],
>     If[t0 < gbvs[t0],
>      phase1 = consProg[f1, v, t0, e, x0s, var1, n1];
>      x0s2 = phase1[[n1 + 1]][[2, 1 ;; 3]];
>      var2 = Append[phase1[[n1 + 1]][[2, 4 ;; 6]], tf];
>      phase2 =
>       consProg[f2, v, phase1[[n1 + 1]][[1]], e, x0s2, var2, n2];
>      Join[phase1, Drop[phase2, 1]],
>      consProg[f2, v, t0, x0s, var, n2],
>      Print["non evaluated inner If"]], Print["non evaluated outer If"]]
>    ];
>
> (*NOW THE COMPILED VERSIONS of the last two routines)
>
> consProgCompile[f_, vp_, t0p_, ep_, x0s_, var_, n_] :=
>   Module[{ts, x0, tf},
>    Clear[v, e, ts, tf];
>    v = vp; ts = t0p; e = ep;
>    x0 = Join[x0s, var];
>    tf = x0[[7]];
>    rk4[f, ts, tf, x0[[1 ;; 6]], n]
>    ];
>
> consumerTRcompile[f1_, f2_, vp_, t0p_, ep_, x0s_, var_, n1_, n2_] :=
>   Module[{x0, tf, var1, x0s2, var2, phase1, phase2},
>    Clear[v, t0, e, tf];
>    v = vp; t0 = t0p; e = ep;
>    tf = var[[4]];
>    var1 = Append[var[[1 ;; 3]], gbvs[t0]];
>    If[tf < gbvs[t0], Abort[],
>     If[t0 < gbvs[t0],
>      phase1 = consProgCompile[f1, v, t0, e, x0s, var1, n1];
>      x0s2 = phase1[[n1 + 1]][[2, 1 ;; 3]];
>      var2 = Append[phase1[[n1 + 1]][[2, 4 ;; 6]], tf];
>      phase2 =
>       consProgCompile[f2, v, phase1[[n1 + 1]][[1]], e, x0s2, var2, n2];
>      Join[phase1, Drop[phase2, 1]],
>      consProgCompile[f2, v, t0, x0s, var, n2],
>      Print["non evaluated inner If"]], Print["non evaluated outer If"]]
>    ];
>
> (*----------------------------***)
> (*RESULTS*)
>
> (*for this simple example...*)
> f = Function[{t, x}, {x[[2]], t*1 - x[[1]] + 4/(x[[1]]^3)}];
> RK4Comp = makeCompRK[f];
>
> (*both forms of the algorithm work, although the compiled is much \
> faster*)
>
> RK4Comp[{1.0, 1.0}, 0.0, 100.0, 50000]; // AbsoluteTiming
> ={2.7456000, Null}
>
> rk4[f, 0.0, 100.0, {1.0, 1.0}, 50000]; // AbsoluteTiming
> = {5.8900000, Null}
>
> but for the problem I am working on we get
>
> consumerTR[**prehamiltonianfield1, prehamiltonianfield2, 0.0,
>    0.0, {1.8236, 0.95}, {0., 0., 0.}, {2.3, 0.12, 3.0, 65.}, 15,
>    50]; // AbsoluteTiming
> ={9.9698000, Null}
> consumerTRcompile[**prehamiltonianfield1, prehamiltonianfield2, 0.0,
>    0.0, {1.8236, 0.95}, {0., 0., 0.}, {2.3, 0.12, 3.0, 65.}, 15,
>    50]; // AbsoluteTiming
> ={9.8652000, Null}
>
> The execution time is similar which I guess is because there is some
> problem
> with the compiled version and it reverts to the uncomplied version.
>
> You've probably noticed that I am feeding the initial conditions in a wei=
rd
> way, instead of having a list of 6 variables, I have one with three and
> another with 4, this last one has the remaining 3 variables and the
> terminal
> time. I am doing this because the value of the variables in the second li=
st
> (including the terminal time) are unknown and have to be such that they
> solve a set of restrictions at terminal time. To do this I apply a newton
> algorithm to find the zeros of that boundary condition which gives me the=
se
> initial conditions and the true terminal time. This implies that the
> runge-kutta will be called many times wihile the newton algorithm is
> running. So speed is really an importante issue here, also because I am
> running this problem for a population and it will have to be run for all
> generations of that population (different initial v - generation) and eve=
ry
> single generation will repeat this procedure for a given moment in
> time....if one adds the fact that I will be following this population
> through time, it becomes clear that the runge kutta algorithm will be use=
d
> thousands of times
>
> Since the right hand side of the ODE system contains switchs and implicit
> functions that call a newton algorithm
> is it still feasible to compile the rk algorithm?
>
> Thanks again in advance
>
> My best regards
>
> Joao Pereira
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
> On Fri, Mar 23, 2012 at 6:32 AM, Oliver Ruebenkoenig <ruebenko at wolfram.co=
m
> >
> wrote:
>
>      Joao,
>
>      it's hard to say what the issue is without seeing the code. I
>      suspect you
>      get more answers if you actually post your code. Please make
>      sure it is
>      complete and easy to copy and past.
>
>      Oliver
>
>
>
>      On Tue, 20 Mar 2012, Joao wrote:
>
> > Hi there, I've been going through this thread (Vector Runge-Kutta
> ODE solver with compilation?) and I would like to know if anyone could
> give me an opinion on the problem I have.
> >
> > Basically I am using the same runge-kutta 4 taken from Jason
> Cantarella's page than Dmitry was, but I am using the NestList
> version.
> >
> > I really need to speed up this algorithm. When I did one of the
> examples you provided on this thread, the compiled version solved it
> at a duration of 7% of the non compiled version which was pretty cool.
> >
> > My problem was that when I tried to use it on the work I'm doing the
> compiled and non compiled versions took the same time. My guess is
> that the compiled version wasn't able to compute and it returned to
> the non-compiled version.
> >
> > At this moment I don't know if I'm doing something wrong or if the
> nature of the RHS of the functions I'm using prevents the compiled
> algorithm to work in a proper way.
> >
> > I can send the code if necessary.
> >
> > Nevertheless, I may advance that my RHS is a system of 6 ODE
> (hamiltonian), these ODE call other functions (controls). These last
> functions (controls) are coded in a module box. Inside that box are
> some implicit functions in R1 (hence a newton algorithm is called) and
> there is also a switch - if a given condition is above a threshold the
> control box assumes one form, bellow that threshold another form is
> assumed. After all these calculations inside the control box it
> returns simply a vector of order 3.
> >
> > I would appreciate any help. At the moment I do not know if this is
> feasible with Compile or not. Maybe I have to put all this different
> boxes inside a compile?
> >
> > Thanks in advance
> >
> > Joao Pereira
> >
> >
>
>
>
>
References:
- Compiling Runge-kutta
  - From: Joao <joaopereira9@gmail.com>
Prev by Date: Re: new functional operator
Next by Date: DialogInput in ButtonFunction
Previous by thread: Re: Compiling Runge-kutta
Next by thread: Re: Compiling Runge-kutta