MathGroup Archive: October 2011 [00207]

[Date Index] [Thread Index] [Author Index]

Re: Fully vectorized system of ODE's - any advantage of C?

To: mathgroup at smc.vnet.net
Subject: [mg121969] Re: Fully vectorized system of ODE's - any advantage of C?
From: Oliver Ruebenkoenig <ruebenko at wolfram.com>
Date: Fri, 7 Oct 2011 04:50:07 -0400 (EDT)
Delivered-to: l-mathgroup@mail-archive0.wolfram.com

On Tue, 4 Oct 2011, DmitryG wrote:

> Dear All,
>
> I was working on solving systems of a big number of ODE's with
> vectorization and compilation in C and I have received a great support
> in this forum. The speed crucially depends on the details of
> programming that not always can be detected by an average user (such
> as myself).
>
> I have realized that my vectorized codes were not yet completely
> vectorized because calls have been made to components of tensors. Then
> I've got an idea of a code that is completely vectorized, for a model
> of classical spins that is one of the subjects of my work. Here is the
> problem that, I believe, can be of interest to many Mathematica
> users.
>
> Dynamics of a chain of classical spins (described by three-component
> vectors of fixed length) precessing in the effective field created by
> neighbors
>
> ds[i]/dt==s[i]\[Cross]H[i];            H[i]=s[i-1]+s[i+1];
> i=1,2,..., NSpins
>
> Both the vector product and interaction with neighbors in H[i] can be
> very efficiently described by the RotateRight command. Putting all
> three components of all NSpins spins into the double list
>
> s={ {s[x,1],s[x,2],..s[x,NSpins]},  {s[y,1],s[y,2],..s[y,NSpins]},
> {s[z,1],sz,2],..s[z,NSpins]} },
>
> one can construct the code
>
> ************************************************************
> (* Runge-Kutta-4 routine *)
> ClearAll[makeCompRK4, makeCompRK5]
> makeCompRK4[f_] :=
> Compile[{{x0, _Real, 2}, {t0}, {tMax}, {n, _Integer}},
>  Module[{h, K1, K2, K3, K4, SolList, x = x0, t}, h = (tMax - t0)/n;
>   SolList = Table[x0, {n + 1}];
>   Do[t = t0 + k h;
>    K1 = h f[t, x];
>    K2 = h f[t + (1/2) h, x + (1/2) K1];
>    K3 = h f[t + (1/2) h, x + (1/2) K2];
>    K4 = h f[t + h, x + K3];
>    x = x + (1/6) K1 + (1/3) K2 + (1/3) K3 + (1/6) K4;
>    SolList[[k + 1]] = x, {k, 1, n}];
>   SolList](*,CompilationTarget->"C"*),
>  CompilationOptions -> {"InlineCompiledFunctions" -> True},
>  "RuntimeOptions" -> "Speed"]
>
> (* Defining equations *)
>
> RHS = Function[{t, s},
>   RotateRight[
>      s, {1, 0}] (RotateRight[s, {2, 1}] + RotateRight[s, {2, -1}]) -
>    RotateRight[
>      s, {2, 0}] (RotateRight[s, {1, 1}] + RotateRight[s, {1, -1}])];
>
> cRHS = Compile[{t, {s, _Real, 2}},
>   RotateRight[
>      s, {1, 0}] (RotateRight[s, {2, 1}] + RotateRight[s, {2, -1}]) -
>    RotateRight[
>      s, {2, 0}] (RotateRight[s, {1, 1}] + RotateRight[s, {1, -1}])];
>
> (*Compilation*)
> tt0 = AbsoluteTime[];
> Timing[RKComp = makeCompRK4[RHS];]
> AbsoluteTime[] - tt0
>
> (* Initial condition *)
> x0 = { Join[{1}, Table[0, {i, 2, NSpins}], {0}],
>  Join[{0}, Table[0, {i, 2, NSpins}], {0}],
>  Join[{0},
>   Table[1, {i, 2,
>     NSpins}], {0}] };  (* Padding with a zero spin at the NSpins+1 \
> position *)
>
> (* Parameters *)
> NSpins = 120;   t0 = 0; tMax = 150; n = 1000;
>
> (* Solving *)
> tt0 = AbsoluteTime[];
> Sol = RKComp[x0, t0, tMax, n];
> AbsoluteTime[] - tt0
>
> Print["Compilation: ", Developer`PackedArrayQ@Sol]
>
> (* Plotting *)
> tList = Table[1. t0 + (tMax - t0) k/n, {k, 0, n}];
> si\[Alpha]List[\[Alpha]_, i_] :=
>  Table[Sol[[k]][[\[Alpha], i]], {k, 0, n}];
> s1List = Transpose[{tList, si\[Alpha]List[3, 1]}];
> s2List = Transpose[{tList, si\[Alpha]List[3, 40]}];
> s3List = Transpose[{tList, si\[Alpha]List[3, 80]}];
> s4List = Transpose[{tList, si\[Alpha]List[3, NSpins]}];
> ListPlot[{s1List, s2List, s3List, s4List},
> PlotStyle -> {Blue, Green, Red, Orange}, PlotRange -> All]
>
> *************************************************************************************
>
> The code is very short and runs faster than all my previous codes for
> the same problem. There are some questions, however:
>
> 1) There is no difference in speed between uncompiled RHS and compiled
> cRHS.

Presumably because the code uses the same library code.

>
> 2) On my laptop (Windows 7, 64 bit), compilation in Mathematica
> results in 0.070 execution time, whereas compilation in C (Microsoft
> Visual C++) is longer, 0.075. In all previous not fully vectorized
> versions of the code compiling in C gave a speed advantage by a factor
> 2-3. Is it a perfect code that cannot be improved by C or, for some
> reason, compilation in C does not work and the system returns to the
> Mathematica compiler?

Hard to say, probably some profiling would tell.

>
> Oliver mentioned earlier that CopyTensor in the loop in the compiled
> code leads to slowdown. Here I do have CopyTensor in the loop. Could
> it be moved outside??
>

I do not think you can.

If you want to try what performance impact this has you could look at the 
generated C code and eliminate/modify those CopyTensor lines and then 
recompile (e.g. by hand ) and see what difference it makes.

Oliver

> I will be very greatful for all comments, as usual.
>
> Dmitry
>
>

Prev by Date: Re: A fast way to compare two vectors

Next by Date: Re: Fully vectorized system of ODE's - any advantage of C?

Previous by thread: Re: Fully vectorized system of ODE's - any advantage of C?

Next by thread: Re: Fully vectorized system of ODE's - any advantage of C?