Re: Compilation: Avoiding inlining

• To: mathgroup at smc.vnet.net
• Subject: [mg121640] Re: Compilation: Avoiding inlining
• From: Oliver Ruebenkoenig <ruebenko at wolfram.com>
• Date: Fri, 23 Sep 2011 03:44:41 -0400 (EDT)
• Delivered-to: l-mathgroup@mail-archive0.wolfram.com
• References: <201109221127.HAA26804@smc.vnet.net>

```On Thu, 22 Sep 2011, DmitryG wrote:

> On Sep 21, 5:41 am, DmitryG <einsch... at gmail.com> wrote:
>> On Sep 20, 6:08 am, David Bailey <d... at removedbailey.co.uk> wrote:
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>> On 16/09/2011 12:08, Oliver Ruebenkoenig wrote:
>>
>>>> On Fri, 16 Sep 2011, DmitryG wrote:
>>
>>>>> Here is a program with (* Definition of the equations *) made one-
>>>>> step, that performs the same as in my previous post....................
>>
>>> I tried pasting your example into Mathematica, but unfortunately there
>>> seems to be a variable 'x' which is undefined - presumably some input
>>> data. It might be worth posting a complete example, so that people can
>>> explore how to get decent performance.
>>
>>> Error:
>>
>>> Part::partd: "Part specification x[[1]] is longer than depth of object. "
>>
>>> David Baileyhttp://www.dbaileyconsultancy.co.uk
>>
>> Hi David,
>>
>> The RK4 procedure here works with the solution vector x whose initial
>> value is defined after the RK4 procedure and the equations are
>> defined. Mathematica does not know the lenght of x at the beginning
>> and this is why it complains. You can ignore these complaints. Of the
>> several codes posted above, that in Oliver's 19 September post is the
>> best because it is fully compiled. Here is this code with plotting the
>> solution:
>>
>> ***************************************
>> (* Runge-Kutta-4 routine *)
>> ClearAll[makeCompRK]
>> makeCompRK[f_] :=
>>  Compile[{{x0, _Real, 1}, {t0, _Real}, {tMax, _Real}, {n, _Integer}},
>>   Module[{h, K1, K2, K3, K4, SolList, x = x0, t}, h = (tMax - t0)/n;
>>    SolList = Table[x0, {n + 1}];
>>    Do[t = t0 + k h;
>>     K1 = h f[t, x];
>>     K2 = h f[t + (1/2) h, x + (1/2) K1];
>>     K3 = h f[t + (1/2) h, x + (1/2) K2];
>>     K4 = h f[t + h, x + K3];
>>     x = x + (1/6) K1 + (1/3) K2 + (1/3) K3 + (1/6) K4;
>>     SolList[[k + 1]] = x, {k, 1, n}];
>>    SolList](*,Parallelization->True*), CompilationTarget -> "C",
>>   CompilationOptions -> {"InlineCompiledFunctions" -> True}]
>>
>> (* Defining equations *)
>> NN = 1000;
>> cRHS = With[{NN = NN}, Compile[{{t, _Real, 0}, {x, _Real, 1}},
>>     Table[-x[[i]]*
>>       Sin[0.1 t]^2/(1 +
>>          100 Sum[x[[i + j]], {j, 1, Min[4, NN - i]}]^2), {i, 1, NN}]
>> (*,
>>     CompilationTarget->"C"*)(*,
>>     CompilationOptions->{"InlineExternalDefinitions"->True}*)]];
>>
>> (*Compilation*)
>> tt0 = AbsoluteTime[];
>> Timing[RK4Comp = makeCompRK[cRHS];]
>> AbsoluteTime[] - tt0
>> (*CompilePrint[RK4Comp2]*)
>>
>> (*Setting parameters and Calculation*)
>> x0 = Table[
>>   RandomReal[{0, 1}], {i, 1, NN}]; t0 = 0; tMax = 300; n = 500;
>> tt0 = AbsoluteTime[];
>> Sol = RK4Comp[x0, t0, tMax, n];
>> AbsoluteTime[] - tt0
>>
>> Print["Compilation: ", Developer`PackedArrayQ@Sol]
>>
>> (* Plotting *)
>> tList = Table[1. t0 + (tMax - t0) k/n, {k, 0, n}];
>> x1List = Transpose[{tList, Transpose[Sol][[1]]}];
>> x2List = Transpose[{tList, Transpose[Sol][[2]]}];
>> x3List = Transpose[{tList, Transpose[Sol][[3]]}];
>> ListLinePlot[{x1List, x2List, x3List}, PlotMarkers -> Automatic,
>>  PlotStyle -> {Blue, Green, Red}, PlotRange -> {0, 1}]
>>
>> Best,
>>
>> Dmitry
>
> The execution time of the program above on my laptop today is 1.0 for
> compilation RK4 in Mathematica and  0.24 for compilation RK4 in C. For

You get some further speed up is you give the

, "RuntimeOptions" -> "Speed"

option to makeCompRK.

> other compiled functions, it does not matter if the compilation target
> is C or Mathematica (why?).
>
> In my actual research program, I have a lot of external definitions
> and all of them have to be compiled. To model this situation, I have
> rewritten the same program with external definitions as follows:
>
> ............
>
> (* Defining equations *)
> NN = 1000;
> ff = Compile[{{t}}, Sin[0.1 t]^2];
> su = With[{NN = NN}, Compile[{{i, _Integer}, {x, _Real, 1}}, Sum[x[[i
> + j]], {j, 1, Min[4, NN - i]}]]];
> cRHS = With[{NN = NN}, Compile[{{t}, {x, _Real, 1}},Table[-
> x[[i]]*ff[t]/(1 + 100 su[i, x]^2), {i, 1, NN}, CompilationOptions ->
> {"InlineExternalDefinitions" -> True, "InlineCompiledFunctions" ->
> True}]];
> ...................................................
>
> Now the execution time is 2.2 for compilation in Mathematica and 1.42
> for compilation in C. We see there is a considerable slowdown because
> of the external definition (actually because of su). I wonder why does
> it happen?

If you look at CompilePrint[cRHS] you will see a CopyTensor that is in the
loop. That causes the slowdown.

With some further optimizations, you could write

su2 = With[{NN = NN},
Compile[{{x, _Real, 1}},
Table[Sum[x[[i + j]], {j, 1, Min[4, NN - i]}], {i, 1, NN}]]];

cRHS2 = With[{NN = NN},
Compile[{{t, _Real, 0}, {x, _Real, 1}}, -x*
ff[t]/(1 + 100*su2[x]^2)
, CompilationTarget -> "C"
, CompilationOptions -> {"InlineExternalDefinitions" -> True,
"InlineCompiledFunctions" -> True}]];

Then, the CopyTensor is outside of the loop.

Why is there a CopyTensor in the first place? Because su could be evil and
(e.g. via a call to MainEvaluate to a function that has Attribute HoldAll)
change the value of the argument. I have to see if that could be avoided.
I'll send an email if I find something.

The external definitions are compiled and inlined in the
> compiled code of RK4Comp, thus, to my understanding the execution time
> should be the same. What is wrong here?
>

I think there might be another cave canem: The expression optimizer that is
called by the compiler may not be able to optimize as much if there are
several function calls instead of one.

Oliver

> Dmitry
>
>

```

• Prev by Date: Re: Plot axis length and size ratio (TwoPlot revive)
• Next by Date: Count Number of Iteration [FindRoot]
• Previous by thread: Re: Compilation: Avoiding inlining
• Next by thread: Re: Compilation: Avoiding inlining