MathGroup Archive: October 2011 [00070]

[Date Index] [Thread Index] [Author Index]

Fully vectorized system of ODE's - any advantage of C?

To: mathgroup at smc.vnet.net
Subject: [mg121839] Fully vectorized system of ODE's - any advantage of C?
From: DmitryG <einschlag at gmail.com>
Date: Tue, 4 Oct 2011 02:44:21 -0400 (EDT)
Delivered-to: l-mathgroup@mail-archive0.wolfram.com

Dear All,

I was working on solving systems of a big number of ODE's with
vectorization and compilation in C and I have received a great support
in this forum. The speed crucially depends on the details of
programming that not always can be detected by an average user (such
as myself).

I have realized that my vectorized codes were not yet completely
vectorized because calls have been made to components of tensors. Then
I've got an idea of a code that is completely vectorized, for a model
of classical spins that is one of the subjects of my work. Here is the
problem that, I believe, can be of interest to many Mathematica
users.

Dynamics of a chain of classical spins (described by three-component
vectors of fixed length) precessing in the effective field created by
neighbors

ds[i]/dt==s[i]\[Cross]H[i];            H[i]=s[i-1]+s[i+1];
i=1,2,..., NSpins

Both the vector product and interaction with neighbors in H[i] can be
very efficiently described by the RotateRight command. Putting all
three components of all NSpins spins into the double list

s={ {s[x,1],s[x,2],..s[x,NSpins]},  {s[y,1],s[y,2],..s[y,NSpins]},
{s[z,1],sz,2],..s[z,NSpins]} },

one can construct the code

************************************************************
(* Runge-Kutta-4 routine *)
ClearAll[makeCompRK4, makeCompRK5]
makeCompRK4[f_] :=
 Compile[{{x0, _Real, 2}, {t0}, {tMax}, {n, _Integer}},
  Module[{h, K1, K2, K3, K4, SolList, x = x0, t}, h = (tMax - t0)/n;
   SolList = Table[x0, {n + 1}];
   Do[t = t0 + k h;
    K1 = h f[t, x];
    K2 = h f[t + (1/2) h, x + (1/2) K1];
    K3 = h f[t + (1/2) h, x + (1/2) K2];
    K4 = h f[t + h, x + K3];
    x = x + (1/6) K1 + (1/3) K2 + (1/3) K3 + (1/6) K4;
    SolList[[k + 1]] = x, {k, 1, n}];
   SolList](*,CompilationTarget->"C"*),
  CompilationOptions -> {"InlineCompiledFunctions" -> True},
  "RuntimeOptions" -> "Speed"]

(* Defining equations *)

RHS = Function[{t, s},
   RotateRight[
      s, {1, 0}] (RotateRight[s, {2, 1}] + RotateRight[s, {2, -1}]) -
    RotateRight[
      s, {2, 0}] (RotateRight[s, {1, 1}] + RotateRight[s, {1, -1}])];

cRHS = Compile[{t, {s, _Real, 2}},
   RotateRight[
      s, {1, 0}] (RotateRight[s, {2, 1}] + RotateRight[s, {2, -1}]) -
    RotateRight[
      s, {2, 0}] (RotateRight[s, {1, 1}] + RotateRight[s, {1, -1}])];

(*Compilation*)
tt0 = AbsoluteTime[];
Timing[RKComp = makeCompRK4[RHS];]
AbsoluteTime[] - tt0

(* Initial condition *)
x0 = { Join[{1}, Table[0, {i, 2, NSpins}], {0}],
  Join[{0}, Table[0, {i, 2, NSpins}], {0}],
  Join[{0},
   Table[1, {i, 2,
     NSpins}], {0}] };  (* Padding with a zero spin at the NSpins+1 \
position *)

(* Parameters *)
NSpins = 120;   t0 = 0; tMax = 150; n = 1000;

(* Solving *)
tt0 = AbsoluteTime[];
Sol = RKComp[x0, t0, tMax, n];
AbsoluteTime[] - tt0

Print["Compilation: ", Developer`PackedArrayQ@Sol]

(* Plotting *)
tList = Table[1. t0 + (tMax - t0) k/n, {k, 0, n}];
si\[Alpha]List[\[Alpha]_, i_] :=
  Table[Sol[[k]][[\[Alpha], i]], {k, 0, n}];
s1List = Transpose[{tList, si\[Alpha]List[3, 1]}];
s2List = Transpose[{tList, si\[Alpha]List[3, 40]}];
s3List = Transpose[{tList, si\[Alpha]List[3, 80]}];
s4List = Transpose[{tList, si\[Alpha]List[3, NSpins]}];
ListPlot[{s1List, s2List, s3List, s4List},
 PlotStyle -> {Blue, Green, Red, Orange}, PlotRange -> All]

*************************************************************************************

The code is very short and runs faster than all my previous codes for
the same problem. There are some questions, however:

1) There is no difference in speed between uncompiled RHS and compiled
cRHS.

2) On my laptop (Windows 7, 64 bit), compilation in Mathematica
results in 0.070 execution time, whereas compilation in C (Microsoft
Visual C++) is longer, 0.075. In all previous not fully vectorized
versions of the code compiling in C gave a speed advantage by a factor
2-3. Is it a perfect code that cannot be improved by C or, for some
reason, compilation in C does not work and the system returns to the
Mathematica compiler?

Oliver mentioned earlier that CopyTensor in the loop in the compiled
code leads to slowdown. Here I do have CopyTensor in the loop. Could
it be moved outside??

I will be very greatful for all comments, as usual.

Dmitry

Prev by Date: Re: Solve - takes very long time

Next by Date: Re: Compilation: Avoiding inlining

Previous by thread: Re: Size of Control[] and OpenerView in Manipulate

Next by thread: Re: Fully vectorized system of ODE's - any advantage of C?