[Date Index]
[Thread Index]
[Author Index]
Re: Compilation: Avoiding inlining
*To*: mathgroup at smc.vnet.net
*Subject*: [mg121724] Re: Compilation: Avoiding inlining
*From*: Oliver Ruebenkoenig <ruebenko at wolfram.com>
*Date*: Tue, 27 Sep 2011 06:22:08 -0400 (EDT)
*Delivered-to*: l-mathgroup@mail-archive0.wolfram.com
*References*: <201109250234.WAA26203@smc.vnet.net>
On Sat, 24 Sep 2011, DmitryG wrote:
> On Sep 23, 3:49 am, Oliver Ruebenkoenig <ruebe... at wolfram.com> wrote:
>> On Thu, 22 Sep 2011, DmitryG wrote:
>>> On Sep 21, 5:41 am, DmitryG <einsch... at gmail.com> wrote:
>>>> On Sep 20, 6:08 am, David Bailey <d... at removedbailey.co.uk> wrote:
>>
>>>>> On 16/09/2011 12:08, Oliver Ruebenkoenig wrote:
>>
>>>>>> On Fri, 16 Sep 2011, DmitryG wrote:
>>
>>>>>>> Here is a program with (* Definition of the equations *) made one-
>>>>>>> step, that performs the same as in my previous post....................
>>
>>>>> I tried pasting your example into Mathematica, but unfortunately there
>>>>> seems to be a variable 'x' which is undefined - presumably some input
>>>>> data. It might be worth posting a complete example, so that people can
>>>>> explore how to get decent performance.
>>
>>>>> Error:
>>
>>>>> Part::partd: "Part specification x[[1]] is longer than depth of object. "
>>
>>>>> David Baileyhttp://www.dbaileyconsultancy.co.uk
>>
>>>> Hi David,
>>
>>>> The RK4 procedure here works with the solution vector x whose initial
>>>> value is defined after the RK4 procedure and the equations are
>>>> defined. Mathematica does not know the lenght of x at the beginning
>>>> and this is why it complains. You can ignore these complaints. Of the
>>>> several codes posted above, that in Oliver's 19 September post is the
>>>> best because it is fully compiled. Here is this code with plotting the
>>>> solution:
>>
>>>> ***************************************
>>>> (* Runge-Kutta-4 routine *)
>>>> ClearAll[makeCompRK]
>>>> makeCompRK[f_] :=
>>>> Compile[{{x0, _Real, 1}, {t0, _Real}, {tMax, _Real}, {n, _Integer}},
>>>> Module[{h, K1, K2, K3, K4, SolList, x = x0, t}, h = (tMax - t0)/n;
>>>> SolList = Table[x0, {n + 1}];
>>>> Do[t = t0 + k h;
>>>> K1 = h f[t, x];
>>>> K2 = h f[t + (1/2) h, x + (1/2) K1];
>>>> K3 = h f[t + (1/2) h, x + (1/2) K2];
>>>> K4 = h f[t + h, x + K3];
>>>> x = x + (1/6) K1 + (1/3) K2 + (1/3) K3 + (1/6) K4;
>>>> SolList[[k + 1]] = x, {k, 1, n}];
>>>> SolList](*,Parallelization->True*), CompilationTarget -> "C",
>>>> CompilationOptions -> {"InlineCompiledFunctions" -> True}]
>>
>>>> (* Defining equations *)
>>>> NN = 1000;
>>>> cRHS = With[{NN = NN}, Compile[{{t, _Real, 0}, {x, _Real, 1}},
>>>> Table[-x[[i]]*
>>>> Sin[0.1 t]^2/(1 +
>>>> 100 Sum[x[[i + j]], {j, 1, Min[4, NN - i]}]^2), {i,1, NN}]
>>>> (*,
>>>> CompilationTarget->"C"*)(*,
>>>> CompilationOptions->{"InlineExternalDefinitions"->True}*)]];
>>
>>>> (*Compilation*)
>>>> tt0 = AbsoluteTime[];
>>>> Timing[RK4Comp = makeCompRK[cRHS];]
>>>> AbsoluteTime[] - tt0
>>>> (*CompilePrint[RK4Comp2]*)
>>
>>>> (*Setting parameters and Calculation*)
>>>> x0 = Table[
>>>> RandomReal[{0, 1}], {i, 1, NN}]; t0 = 0; tMax = 300; n = 500;
>>>> tt0 = AbsoluteTime[];
>>>> Sol = RK4Comp[x0, t0, tMax, n];
>>>> AbsoluteTime[] - tt0
>>
>>>> Print["Compilation: ", Developer`PackedArrayQ@Sol]
>>
>>>> (* Plotting *)
>>>> tList = Table[1. t0 + (tMax - t0) k/n, {k, 0, n}];
>>>> x1List = Transpose[{tList, Transpose[Sol][[1]]}];
>>>> x2List = Transpose[{tList, Transpose[Sol][[2]]}];
>>>> x3List = Transpose[{tList, Transpose[Sol][[3]]}];
>>>> ListLinePlot[{x1List, x2List, x3List}, PlotMarkers -> Automatic,
>>>> PlotStyle -> {Blue, Green, Red}, PlotRange -> {0, 1}]
>>
>>>> Best,
>>
>>>> Dmitry
>>
>>> The execution time of the program above on my laptop today is 1.0 for
>>> compilation RK4 in Mathematica and 0.24 for compilation RK4 in C. For
>>
>> You get some further speed up is you give the
>>
>> , "RuntimeOptions" -> "Speed"
>>
>> option to makeCompRK.
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>> other compiled functions, it does not matter if the compilation target
>>> is C or Mathematica (why?).
>>
>>> In my actual research program, I have a lot of external definitions
>>> and all of them have to be compiled. To model this situation, I have
>>> rewritten the same program with external definitions as follows:
>>
>>> ............
>>
>>> (* Defining equations *)
>>> NN = 1000;
>>> ff = Compile[{{t}}, Sin[0.1 t]^2];
>>> su = With[{NN = NN}, Compile[{{i, _Integer}, {x, _Real, 1}}, Sum[x[[i
>>> + j]], {j, 1, Min[4, NN - i]}]]];
>>> cRHS = With[{NN = NN}, Compile[{{t}, {x, _Real, 1}},Table[-
>>> x[[i]]*ff[t]/(1 + 100 su[i, x]^2), {i, 1, NN}, CompilationOptions ->
>>> {"InlineExternalDefinitions" -> True, "InlineCompiledFunctions" ->
>>> True}]];
>>> ...................................................
>>
>>> Now the execution time is 2.2 for compilation in Mathematica and 1.42
>>> for compilation in C. We see there is a considerable slowdown because
>>> of the external definition (actually because of su). I wonder why does
>>> it happen?
>>
>> If you look at CompilePrint[cRHS] you will see a CopyTensor that is in the
>> loop. That causes the slowdown.
>>
>> With some further optimizations, you could write
>>
>> su2 = With[{NN = NN},
>> Compile[{{x, _Real, 1}},
>> Table[Sum[x[[i + j]], {j, 1, Min[4, NN - i]}], {i, 1, NN}]]];
>>
>> cRHS2 = With[{NN = NN},
>> Compile[{{t, _Real, 0}, {x, _Real, 1}}, -x*
>> ff[t]/(1 + 100*su2[x]^2)
>> , CompilationTarget -> "C"
>> , CompilationOptions -> {"InlineExternalDefinitions" -> True,
>> "InlineCompiledFunctions" -> True}]];
>>
>> Then, the CopyTensor is outside of the loop.
>>
>> Why is there a CopyTensor in the first place? Because su could be evil and
>> (e.g. via a call to MainEvaluate to a function that has Attribute HoldAll)
>> change the value of the argument. I have to see if that could be avoided.
>> I'll send an email if I find something.
>>
>> The external definitions are compiled and inlined in the
>>
>>> compiled code of RK4Comp, thus, to my understanding the execution time
>>> should be the same. What is wrong here?
>>
>> I think there might be another cave canem: The expression optimizer that is
>> called by the compiler may not be able to optimize as much if there are
>> several function calls instead of one.
>>
>> Oliver
>>
>>
>>
>>
>>
>>
>>
>>> Dmitry
>
> Thank you Oliver! Great ideas, as usual!
>
> I was able to rewrite my actual research programs in this way and they
> run faster.
>
> A potentially very important question: I have noticed that the program
> we are discussing, when compiled in C, runs on both cores of my
Only when compiled to C? You could try to set Parallelization->False
and/or it might be that MKL runs some stuff in parallel.
Try
SetSystemOptions["MKLThreads" -> 1] and see if that helps.
> processor. No parallelization options have been set, so what is it?
> Automatic parallelization by the C compiler (I have Microsoft visual
> under Windows 7) ? Do you have this effect on your computer?
>
I can not test that since I use Linux/gcc.
> However, the programs of a different type, such as my research
> program, still run on one core of the processor. I don't see what
> makes the compiled program run in different ways, because they are
> written similarly.
>
I understand that you'd want to compare the generated code with the
handwritten code on the same number of threads but I can not resist to
point out that the parallelization of the C++ code is something that needs
to be developed but that parallelization via Mathematica come at almost
not additional cost.
On a completely different note, here is another approach that could be
taken.
CCodeGenerator/tutorial/CodeGeneration
Oliver
Prev by Date:
**Re: Constrain locator**
Next by Date:
**algebraic simplification**
Previous by thread:
**Re: Compilation: Avoiding inlining**
Next by thread:
**Nonlinearregress with symbolic partial differential equation and integration**
| |