Re: Compilation: Avoiding inlining

*To*: mathgroup at smc.vnet.net*Subject*: [mg121665] Re: Compilation: Avoiding inlining*From*: DmitryG <einschlag at gmail.com>*Date*: Sat, 24 Sep 2011 22:34:35 -0400 (EDT)*Delivered-to*: l-mathgroup@mail-archive0.wolfram.com*References*: <201109221127.HAA26804@smc.vnet.net> <j5hdm4$7k3$1@smc.vnet.net>

On Sep 23, 3:49 am, Oliver Ruebenkoenig <ruebe... at wolfram.com> wrote: > On Thu, 22 Sep 2011, DmitryG wrote: > > On Sep 21, 5:41 am, DmitryG <einsch... at gmail.com> wrote: > >> On Sep 20, 6:08 am, David Bailey <d... at removedbailey.co.uk> wrote: > > >>> On 16/09/2011 12:08, Oliver Ruebenkoenig wrote: > > >>>> On Fri, 16 Sep 2011, DmitryG wrote: > > >>>>> Here is a program with (* Definition of the equations *) made one- > >>>>> step, that performs the same as in my previous post................= .... > > >>> I tried pasting your example into Mathematica, but unfortunately ther= e > >>> seems to be a variable 'x' which is undefined - presumably some input > >>> data. It might be worth posting a complete example, so that people ca= n > >>> explore how to get decent performance. > > >>> Error: > > >>> Part::partd: "Part specification x[[1]] is longer than depth of objec= t. " > > >>> David Baileyhttp://www.dbaileyconsultancy.co.uk > > >> Hi David, > > >> The RK4 procedure here works with the solution vector x whose initial > >> value is defined after the RK4 procedure and the equations are > >> defined. Mathematica does not know the lenght of x at the beginning > >> and this is why it complains. You can ignore these complaints. Of the > >> several codes posted above, that in Oliver's 19 September post is the > >> best because it is fully compiled. Here is this code with plotting the > >> solution: > > >> *************************************** > >> (* Runge-Kutta-4 routine *) > >> ClearAll[makeCompRK] > >> makeCompRK[f_] := > >> Compile[{{x0, _Real, 1}, {t0, _Real}, {tMax, _Real}, {n, _Integer}}= , > >> Module[{h, K1, K2, K3, K4, SolList, x = x0, t}, h = (tMax - t0= )/n; > >> SolList = Table[x0, {n + 1}]; > >> Do[t = t0 + k h; > >> K1 = h f[t, x]; > >> K2 = h f[t + (1/2) h, x + (1/2) K1]; > >> K3 = h f[t + (1/2) h, x + (1/2) K2]; > >> K4 = h f[t + h, x + K3]; > >> x = x + (1/6) K1 + (1/3) K2 + (1/3) K3 + (1/6) K4; > >> SolList[[k + 1]] = x, {k, 1, n}]; > >> SolList](*,Parallelization->True*), CompilationTarget -> "C", > >> CompilationOptions -> {"InlineCompiledFunctions" -> True}] > > >> (* Defining equations *) > >> NN = 1000; > >> cRHS = With[{NN = NN}, Compile[{{t, _Real, 0}, {x, _Real, 1}}, > >> Table[-x[[i]]* > >> Sin[0.1 t]^2/(1 + > >> 100 Sum[x[[i + j]], {j, 1, Min[4, NN - i]}]^2), {i,= 1, NN}] > >> (*, > >> CompilationTarget->"C"*)(*, > >> CompilationOptions->{"InlineExternalDefinitions"->True}*)]]; > > >> (*Compilation*) > >> tt0 = AbsoluteTime[]; > >> Timing[RK4Comp = makeCompRK[cRHS];] > >> AbsoluteTime[] - tt0 > >> (*CompilePrint[RK4Comp2]*) > > >> (*Setting parameters and Calculation*) > >> x0 = Table[ > >> RandomReal[{0, 1}], {i, 1, NN}]; t0 = 0; tMax = 300; n = 500= ; > >> tt0 = AbsoluteTime[]; > >> Sol = RK4Comp[x0, t0, tMax, n]; > >> AbsoluteTime[] - tt0 > > >> Print["Compilation: ", Developer`PackedArrayQ@Sol] > > >> (* Plotting *) > >> tList = Table[1. t0 + (tMax - t0) k/n, {k, 0, n}]; > >> x1List = Transpose[{tList, Transpose[Sol][[1]]}]; > >> x2List = Transpose[{tList, Transpose[Sol][[2]]}]; > >> x3List = Transpose[{tList, Transpose[Sol][[3]]}]; > >> ListLinePlot[{x1List, x2List, x3List}, PlotMarkers -> Automatic, > >> PlotStyle -> {Blue, Green, Red}, PlotRange -> {0, 1}] > > >> Best, > > >> Dmitry > > > The execution time of the program above on my laptop today is 1.0 for > > compilation RK4 in Mathematica and 0.24 for compilation RK4 in C. Fo= r > > You get some further speed up is you give the > > , "RuntimeOptions" -> "Speed" > > option to makeCompRK. > > > > > > > > > > > other compiled functions, it does not matter if the compilation target > > is C or Mathematica (why?). > > > In my actual research program, I have a lot of external definitions > > and all of them have to be compiled. To model this situation, I have > > rewritten the same program with external definitions as follows: > > > ............ > > > (* Defining equations *) > > NN = 1000; > > ff = Compile[{{t}}, Sin[0.1 t]^2]; > > su = With[{NN = NN}, Compile[{{i, _Integer}, {x, _Real, 1}}, Sum[x[= [i > > + j]], {j, 1, Min[4, NN - i]}]]]; > > cRHS = With[{NN = NN}, Compile[{{t}, {x, _Real, 1}},Table[- > > x[[i]]*ff[t]/(1 + 100 su[i, x]^2), {i, 1, NN}, CompilationOptions -> > > {"InlineExternalDefinitions" -> True, "InlineCompiledFunctions" -> > > True}]]; > > ................................................... > > > Now the execution time is 2.2 for compilation in Mathematica and 1.42 > > for compilation in C. We see there is a considerable slowdown because > > of the external definition (actually because of su). I wonder why does > > it happen? > > If you look at CompilePrint[cRHS] you will see a CopyTensor that is in th= e > loop. That causes the slowdown. > > With some further optimizations, you could write > > su2 = With[{NN = NN}, > Compile[{{x, _Real, 1}}, > Table[Sum[x[[i + j]], {j, 1, Min[4, NN - i]}], {i, 1, NN}]]]; > > cRHS2 = With[{NN = NN}, > Compile[{{t, _Real, 0}, {x, _Real, 1}}, -x* > ff[t]/(1 + 100*su2[x]^2) > , CompilationTarget -> "C" > , CompilationOptions -> {"InlineExternalDefinitions" -> True, > "InlineCompiledFunctions" -> True}]]; > > Then, the CopyTensor is outside of the loop. > > Why is there a CopyTensor in the first place? Because su could be evil an= d > (e.g. via a call to MainEvaluate to a function that has Attribute HoldAll= ) > change the value of the argument. I have to see if that could be avoided. > I'll send an email if I find something. > > The external definitions are compiled and inlined in the > > > compiled code of RK4Comp, thus, to my understanding the execution time > > should be the same. What is wrong here? > > I think there might be another cave canem: The expression optimizer that = is > called by the compiler may not be able to optimize as much if there are > several function calls instead of one. > > Oliver > > > > > > > > > Dmitry Thank you Oliver! Great ideas, as usual! I was able to rewrite my actual research programs in this way and they run faster. A potentially very important question: I have noticed that the program we are discussing, when compiled in C, runs on both cores of my processor. No parallelization options have been set, so what is it? Automatic parallelization by the C compiler (I have Microsoft visual under Windows 7) ? Do you have this effect on your computer? However, the programs of a different type, such as my research program, still run on one core of the processor. I don't see what makes the compiled program run in different ways, because they are written similarly.

**Follow-Ups**:**Re: Compilation: Avoiding inlining***From:*Oliver Ruebenkoenig <ruebenko@wolfram.com>

**References**:**Re: Compilation: Avoiding inlining***From:*DmitryG <einschlag@gmail.com>

**Re: Calculus and InterpolatingFunction**

**Elementwise Matrix Subtraction**

**Re: Compilation: Avoiding inlining**

**Re: Compilation: Avoiding inlining**