MathGroup Archive: October 2011 [00258]

[Date Index] [Thread Index] [Author Index]
Re: Compilation: Avoiding inlining
To: mathgroup at smc.vnet.net
Subject: [mg122008] Re: Compilation: Avoiding inlining
From: DmitryG <einschlag at gmail.com>
Date: Sun, 9 Oct 2011 03:50:35 -0400 (EDT)
Delivered-to: l-mathgroup@mail-archive0.wolfram.com
References: <201110060820.EAA22516@smc.vnet.net> <j6mfc6$7k8$1@smc.vnet.net>
On Oct 7, 5:05 am, Oliver Ruebenkoenig <ruebe... at wolfram.com> wrote:
> On Thu, 6 Oct 2011,DmitryGwrote:
> > On Oct 5, 4:15 am, "Oleksandr Rasputinov"
> > <oleksandr_rasputi... at hmamail.com> wrote:
> >> On Tue, 04 Oct 2011 07:45:30 +0100,DmitryG<einsch... at gmail.com> wrote:
> >>> On Sep 27, 6:24 am, Oliver Ruebenkoenig <ruebe... at wolfram.com> wrote:
> >>>> On Sat, 24 Sep 2011,DmitryGwrote:
>
> >>>>> A potentially very important question: I have noticed that the prog=
ram
> >>>>> we are discussing, when compiled in C, runs on both cores of my
>
> >>>> Only when compiled to C? You could try to set Parallelization->False
> >>>> and/or it might be that MKL runs some stuff in parallel.
>
> >>>> Try
>
> >>>> SetSystemOptions["MKLThreads" -> 1] and see if that helps.
>
> >>>>> processor. No parallelization options have been set, so what is it?
> >>>>> Automatic parallelization by the C compiler (I have Microsoft visua=
l
> >>>>> under Windows 7) ?  Do you have this effect on your computer?
>
> >>>> I can not test that since I use Linux/gcc.
>
> >>>>> However, the programs of a different type, such as my research
> >>>>> program, still run on one core of the processor. I don't see what
> >>>>> makes the compiled program run in different ways, because they are
> >>>>> written similarly.
>
> >>>> I understand that you'd want to compare the generated code with the
> >>>> handwritten code on the same number of threads but I can not resist =
to
> >>>> point out that the parallelization of the C++ code is something that
> >>>> needs
> >>>> to be developed but that parallelization via Mathematica come at alm=
ost
> >>>> not additional cost.
>
> >>>> On a completely different note, here is another approach that could =
be
> >>>> taken.
>
> >>>> CCodeGenerator/tutorial/CodeGeneration
>
> >>>> Oliver
>
> >>> This behavior is not new to me. Calculating matrix exponentials also
> >>> leads to a 100% processor usage on multiprocessor computers without
> >>> any parallelization. I have observed it on my Windows 7 laptop and on
> >>> a Mac Pro at work. The system monitor shows that only one Mathematica
> >>> kernel is working but the load of this kernel is much greater than
> >>> 100%, especially on the Mac Pro that has 8 cores. My laptop may switc=
h
> >>> off (because of overheating?) during such calculations while the Mac
> >>> is OK.
>
> >>> Also I've seen such a behavior solving PDEs with NDSolve in some
> >>> cases.
>
> >>> I wonder what is happening and I do not know whether this effect is
> >>> good or bad. As I cannot control it, I cannot measure if such an
> >>> extensive processor usage leads to a speed-up.
>
> >>> I am going to get Mathematica for Linux and test it there, too.
>
> >>> Best,
>
> >>> Dmitry
>
> >> In the case of the matrix exponentials (or really any numerical linear
> >> algebra), this behaviour is undoubtedly due to MKL threading and can b=
e
> >> controlled by the option Oliver gives above. Obviously it is not good =
if
>
> >> your laptop switches off due to overheating, but this is not so much a
> >> problem of Mathematica as badly designed cooling in the laptop. MKL's
> >> threading is carefully done and scales well for moderate numbers of co=
res,
> >> so you should be seeing considerably increased performance as a result=
 of
> >> it on an 8-core machine. In regard to NDSolve, I don't know how this i=
s
> >> implemented internally and so can't comment on any parallelization tha=
t
> >> might exist.
>
> > Thank you, Oleksandr!
>
> > For some reason, I've overlooked Oliver's suggestion to try
> > SetSystemOptions["MKLThreads" -> 1].
>
> > MKL stands for Intel's Math Kernel Library (I have Intel on both
> > computers) and it seems to be a big thing, if you can use it right. It
> > seems, the Intel processor can parallelize problems in some cases. But
> > in which cases? How can we know it? It would be very desirable to be
> > able to write codes that allow this kind of automatic parallelization.
>
> > I have made experiments,
>
> > SetSystemOptions["MKLThreads" -> 2];
> > NN = 1000;
> > AMatr = Table[RandomReal[{0, 1}], {i, 1, NN}, {j, 1, NN}];
> > AbsoluteTiming[ MatrixExp[AMatr]; ]
>
> > Out[12]= {2.2281247, Null}
>
> > while
>
> > SetSystemOptions["MKLThreads" -> 1];
> > NN = 1000;
> > AMatr = Table[RandomReal[{0, 1}], {i, 1, NN}, {j, 1, NN}];
> > AbsoluteTiming[MatrixExp[AMatr];]
>
> > Out[16]= {3.6412015, Null}
>
> > That is, there is a speed-up and we can control it. The same for
> > matrix multiplication.
>
> > On the other hand, Oliver's code of 23 September above runs on both
> > cores and I cannot control it by SetSystemOptions["MKLThreads" -> 1].
> > Here is the complete code:
>
> Try forcing a non parallel version with Parallelization->False.
>
> The default is Automatic and applies some heuristic when code is run in
> parallel and when not.
>
> Does this help?
>
> Oliver
>
>
>
>
>
>
>
>
>
> > *****************************************************
> > (* Runge-Kutta-4 routine *)
> > ClearAll[makeCompRK]
> > makeCompRK[f_] :=
> > Compile[{{x0, _Real, 1}, {t0}, {tMax}, {n, _Integer}},
> >  Module[{h, K1, K2, K3, K4, SolList, x = x0, t}, h = (tMax - t0)/=
n;
> >   SolList = Table[x0, {n + 1}];
> >   Do[t = t0 + k h;
> >    K1 = h f[t, x];
> >    K2 = h f[t + (1/2) h, x + (1/2) K1];
> >    K3 = h f[t + (1/2) h, x + (1/2) K2];
> >    K4 = h f[t + h, x + K3];
> >    x = x + (1/6) K1 + (1/3) K2 + (1/3) K3 + (1/6) K4;
> >    SolList[[k + 1]] = x, {k, 1, n}];
> >   SolList](*,Parallelization->True*), CompilationTarget -> "C",
> >  CompilationOptions -> {"InlineCompiledFunctions" -> True},
> >  "RuntimeOptions" -> "Speed"]
>
> > (* Defining equations *)
> > NN = 10000;
> > ff = Compile[{t},
> >  Sin[0.1 t]^2];  (* ff is inserted into cRHS, the way to go *)
>
> > su = With[{NN = NN},
> >  Compile[{{i, _Integer}, {x, _Real, 1}},
> >   Sum[x[[i + j]], {j, 1, Min[4, NN - i]}]]];
> > cRHS = With[{NN = NN}, Compile[{{t}, {x, _Real, 1}},
> >    Table[-x[[i]]*ff[t]/(1 + 100 su[i, x]^2), {i, 1, NN}](*,
> >    CompilationTarget->"C"*),
> >    CompilationOptions -> {"InlineExternalDefinitions" -> True,
> >      "InlineCompiledFunctions" -> True}]];
>
> > (* With this trick it runs faster *)
> > su2 = With[{NN = NN},
> >   Compile[{{x, _Real, 1}},
> >    Table[Sum[x[[i + j]], {j, 1, Min[4, NN - i]}], {i, 1, NN}]]];
> > cRHS2 = With[{NN = NN},
> >   Compile[{t, {x, _Real, 1}}, -x*ff[t]/(1 + 100*su2[x]^2),
> >    CompilationTarget -> "C",
> >    CompilationOptions -> {"InlineExternalDefinitions" -> True,
> >      "InlineCompiledFunctions" -> True}]];
>
> > (*Compilation*)
> > tt0 = AbsoluteTime[];
> > Timing[RK4Comp = makeCompRK[cRHS2];]
> > AbsoluteTime[] - tt0
> > (*CompilePrint[RK4Comp2]*)
> > (*switch inling to True/False to see what is happening*)
>
> > (*Setting parameters and Calculation*)
> > x0 = Table[
> >  RandomReal[{0, 1}], {i, 1, NN}]; t0 = 0; tMax = 300; n = 500;
> > tt0 = AbsoluteTime[];
> > Sol = RK4Comp[x0, t0, tMax, n];
> > AbsoluteTime[] - tt0
>
> > Print["Compilation: ", Developer`PackedArrayQ@Sol]
>
> > (* Plotting *)
> > tList = Table[1. t0 + (tMax - t0) k/n, {k, 0, n}];
> > x1List = Transpose[{tList, Transpose[Sol][[1]]}];
> > x2List = Transpose[{tList, Transpose[Sol][[2]]}];
> > x3List = Transpose[{tList, Transpose[Sol][[3]]}];
> > ListPlot[{x1List, x2List, x3List}, PlotStyle -> {Blue, Green, Red},
> > PlotRange -> All]
>
> > ****************************************************************
>
> > My initial code uses su and cRHS while Oliver's faster code uses su2
> > and cRHS2. Here, SetSystemOptions["MKLThreads" -> 1] does not affect
> > the core usage. Thus I do not know if there is automatic
> > parallelization via MKL here or it is a parasite effect that does not
> > lead to speed-up.
>
> > I am quite intrigued now!
>
> > Best regards,
>
> > Dmitry

No, on my computer (Windows 7, Intel Core Duo) these commands do not
change anyrhing and the processor load is always 100% with cRHS2.

Dmitry
References:
- Re: Compilation: Avoiding inlining
  - From: DmitryG <einschlag@gmail.com>
Prev by Date: Re: Solve[] with inequalities
Next by Date: How long does it take to run the Notebook that makes the mathematica Quintic poster?
Previous by thread: Re: Compilation: Avoiding inlining
Next by thread: can't find info about & /@ %