Re: Compilation: Avoiding inlining
- To: mathgroup at smc.vnet.net
- Subject: [mg121640] Re: Compilation: Avoiding inlining
- From: Oliver Ruebenkoenig <ruebenko at wolfram.com>
- Date: Fri, 23 Sep 2011 03:44:41 -0400 (EDT)
- Delivered-to: l-mathgroup@mail-archive0.wolfram.com
- References: <201109221127.HAA26804@smc.vnet.net>
On Thu, 22 Sep 2011, DmitryG wrote: > On Sep 21, 5:41 am, DmitryG <einsch... at gmail.com> wrote: >> On Sep 20, 6:08 am, David Bailey <d... at removedbailey.co.uk> wrote: >> >> >> >> >> >> >> >> >> >>> On 16/09/2011 12:08, Oliver Ruebenkoenig wrote: >> >>>> On Fri, 16 Sep 2011, DmitryG wrote: >> >>>>> Here is a program with (* Definition of the equations *) made one- >>>>> step, that performs the same as in my previous post.................... >> >>> I tried pasting your example into Mathematica, but unfortunately there >>> seems to be a variable 'x' which is undefined - presumably some input >>> data. It might be worth posting a complete example, so that people can >>> explore how to get decent performance. >> >>> Error: >> >>> Part::partd: "Part specification x[[1]] is longer than depth of object. " >> >>> David Baileyhttp://www.dbaileyconsultancy.co.uk >> >> Hi David, >> >> The RK4 procedure here works with the solution vector x whose initial >> value is defined after the RK4 procedure and the equations are >> defined. Mathematica does not know the lenght of x at the beginning >> and this is why it complains. You can ignore these complaints. Of the >> several codes posted above, that in Oliver's 19 September post is the >> best because it is fully compiled. Here is this code with plotting the >> solution: >> >> *************************************** >> (* Runge-Kutta-4 routine *) >> ClearAll[makeCompRK] >> makeCompRK[f_] := >> Compile[{{x0, _Real, 1}, {t0, _Real}, {tMax, _Real}, {n, _Integer}}, >> Module[{h, K1, K2, K3, K4, SolList, x = x0, t}, h = (tMax - t0)/n; >> SolList = Table[x0, {n + 1}]; >> Do[t = t0 + k h; >> K1 = h f[t, x]; >> K2 = h f[t + (1/2) h, x + (1/2) K1]; >> K3 = h f[t + (1/2) h, x + (1/2) K2]; >> K4 = h f[t + h, x + K3]; >> x = x + (1/6) K1 + (1/3) K2 + (1/3) K3 + (1/6) K4; >> SolList[[k + 1]] = x, {k, 1, n}]; >> SolList](*,Parallelization->True*), CompilationTarget -> "C", >> CompilationOptions -> {"InlineCompiledFunctions" -> True}] >> >> (* Defining equations *) >> NN = 1000; >> cRHS = With[{NN = NN}, Compile[{{t, _Real, 0}, {x, _Real, 1}}, >> Table[-x[[i]]* >> Sin[0.1 t]^2/(1 + >> 100 Sum[x[[i + j]], {j, 1, Min[4, NN - i]}]^2), {i, 1, NN}] >> (*, >> CompilationTarget->"C"*)(*, >> CompilationOptions->{"InlineExternalDefinitions"->True}*)]]; >> >> (*Compilation*) >> tt0 = AbsoluteTime[]; >> Timing[RK4Comp = makeCompRK[cRHS];] >> AbsoluteTime[] - tt0 >> (*CompilePrint[RK4Comp2]*) >> >> (*Setting parameters and Calculation*) >> x0 = Table[ >> RandomReal[{0, 1}], {i, 1, NN}]; t0 = 0; tMax = 300; n = 500; >> tt0 = AbsoluteTime[]; >> Sol = RK4Comp[x0, t0, tMax, n]; >> AbsoluteTime[] - tt0 >> >> Print["Compilation: ", Developer`PackedArrayQ@Sol] >> >> (* Plotting *) >> tList = Table[1. t0 + (tMax - t0) k/n, {k, 0, n}]; >> x1List = Transpose[{tList, Transpose[Sol][[1]]}]; >> x2List = Transpose[{tList, Transpose[Sol][[2]]}]; >> x3List = Transpose[{tList, Transpose[Sol][[3]]}]; >> ListLinePlot[{x1List, x2List, x3List}, PlotMarkers -> Automatic, >> PlotStyle -> {Blue, Green, Red}, PlotRange -> {0, 1}] >> >> Best, >> >> Dmitry > > The execution time of the program above on my laptop today is 1.0 for > compilation RK4 in Mathematica and 0.24 for compilation RK4 in C. For You get some further speed up is you give the , "RuntimeOptions" -> "Speed" option to makeCompRK. > other compiled functions, it does not matter if the compilation target > is C or Mathematica (why?). > > In my actual research program, I have a lot of external definitions > and all of them have to be compiled. To model this situation, I have > rewritten the same program with external definitions as follows: > > ............ > > (* Defining equations *) > NN = 1000; > ff = Compile[{{t}}, Sin[0.1 t]^2]; > su = With[{NN = NN}, Compile[{{i, _Integer}, {x, _Real, 1}}, Sum[x[[i > + j]], {j, 1, Min[4, NN - i]}]]]; > cRHS = With[{NN = NN}, Compile[{{t}, {x, _Real, 1}},Table[- > x[[i]]*ff[t]/(1 + 100 su[i, x]^2), {i, 1, NN}, CompilationOptions -> > {"InlineExternalDefinitions" -> True, "InlineCompiledFunctions" -> > True}]]; > ................................................... > > Now the execution time is 2.2 for compilation in Mathematica and 1.42 > for compilation in C. We see there is a considerable slowdown because > of the external definition (actually because of su). I wonder why does > it happen? If you look at CompilePrint[cRHS] you will see a CopyTensor that is in the loop. That causes the slowdown. With some further optimizations, you could write su2 = With[{NN = NN}, Compile[{{x, _Real, 1}}, Table[Sum[x[[i + j]], {j, 1, Min[4, NN - i]}], {i, 1, NN}]]]; cRHS2 = With[{NN = NN}, Compile[{{t, _Real, 0}, {x, _Real, 1}}, -x* ff[t]/(1 + 100*su2[x]^2) , CompilationTarget -> "C" , CompilationOptions -> {"InlineExternalDefinitions" -> True, "InlineCompiledFunctions" -> True}]]; Then, the CopyTensor is outside of the loop. Why is there a CopyTensor in the first place? Because su could be evil and (e.g. via a call to MainEvaluate to a function that has Attribute HoldAll) change the value of the argument. I have to see if that could be avoided. I'll send an email if I find something. The external definitions are compiled and inlined in the > compiled code of RK4Comp, thus, to my understanding the execution time > should be the same. What is wrong here? > I think there might be another cave canem: The expression optimizer that is called by the compiler may not be able to optimize as much if there are several function calls instead of one. Oliver > Dmitry > >
- References:
- Re: Compilation: Avoiding inlining
- From: DmitryG <einschlag@gmail.com>
- Re: Compilation: Avoiding inlining