Re: Fully vectorized system of ODE's - any advantage of C?

*To*: mathgroup at smc.vnet.net*Subject*: [mg121967] Re: Fully vectorized system of ODE's - any advantage of C?*From*: Oliver Ruebenkoenig <ruebenko at wolfram.com>*Date*: Fri, 7 Oct 2011 04:49:45 -0400 (EDT)*Delivered-to*: l-mathgroup@mail-archive0.wolfram.com

On Wed, 5 Oct 2011, DmitryG wrote: > On Oct 4, 2:44 am, DmitryG <einsch... at gmail.com> wrote: >> Dear All, >> >> I was working on solving systems of a big number of ODE's with >> vectorization and compilation in C and I have received a great support >> in this forum. The speed crucially depends on the details of >> programming that not always can be detected by an average user (such >> as myself). >> >> I have realized that my vectorized codes were not yet completely >> vectorized because calls have been made to components of tensors. Then >> I've got an idea of a code that is completely vectorized, for a model >> of classical spins that is one of the subjects of my work. Here is the >> problem that, I believe, can be of interest to many Mathematica >> users. >> >> Dynamics of a chain of classical spins (described by three-component >> vectors of fixed length) precessing in the effective field created by >> neighbors >> >> ds[i]/dt==s[i]\[Cross]H[i]; H[i]=s[i-1]+s[i+= > 1]; >> i=1,2,..., NSpins >> >> Both the vector product and interaction with neighbors in H[i] can be >> very efficiently described by the RotateRight command. Putting all >> three components of all NSpins spins into the double list >> >> s={ {s[x,1],s[x,2],..s[x,NSpins]}, {s[y,1],s[y,2],..s[y,NSpins]}, >> {s[z,1],sz,2],..s[z,NSpins]} }, >> >> one can construct the code >> >> ************************************************************ >> (* Runge-Kutta-4 routine *) >> ClearAll[makeCompRK4, makeCompRK5] >> makeCompRK4[f_] := >> Compile[{{x0, _Real, 2}, {t0}, {tMax}, {n, _Integer}}, >> Module[{h, K1, K2, K3, K4, SolList, x = x0, t}, h = (tMax - t0)/n= > ; >> SolList = Table[x0, {n + 1}]; >> Do[t = t0 + k h; >> K1 = h f[t, x]; >> K2 = h f[t + (1/2) h, x + (1/2) K1]; >> K3 = h f[t + (1/2) h, x + (1/2) K2]; >> K4 = h f[t + h, x + K3]; >> x = x + (1/6) K1 + (1/3) K2 + (1/3) K3 + (1/6) K4; >> SolList[[k + 1]] = x, {k, 1, n}]; >> SolList](*,CompilationTarget->"C"*), >> CompilationOptions -> {"InlineCompiledFunctions" -> True}, >> "RuntimeOptions" -> "Speed"] >> >> (* Defining equations *) >> >> RHS = Function[{t, s}, >> RotateRight[ >> s, {1, 0}] (RotateRight[s, {2, 1}] + RotateRight[s, {2, -1}])= > - >> RotateRight[ >> s, {2, 0}] (RotateRight[s, {1, 1}] + RotateRight[s, {1, -1}])= > ]; >> >> cRHS = Compile[{t, {s, _Real, 2}}, >> RotateRight[ >> s, {1, 0}] (RotateRight[s, {2, 1}] + RotateRight[s, {2, -1}])= > - >> RotateRight[ >> s, {2, 0}] (RotateRight[s, {1, 1}] + RotateRight[s, {1, -1}])= > ]; >> >> (*Compilation*) >> tt0 = AbsoluteTime[]; >> Timing[RKComp = makeCompRK4[RHS];] >> AbsoluteTime[] - tt0 >> >> (* Initial condition *) >> x0 = { Join[{1}, Table[0, {i, 2, NSpins}], {0}], >> Join[{0}, Table[0, {i, 2, NSpins}], {0}], >> Join[{0}, >> Table[1, {i, 2, >> NSpins}], {0}] }; (* Padding with a zero spin at the NSpins= > +1 \ >> position *) >> >> (* Parameters *) >> NSpins = 120; t0 = 0; tMax = 150; n = 1000; >> >> (* Solving *) >> tt0 = AbsoluteTime[]; >> Sol = RKComp[x0, t0, tMax, n]; >> AbsoluteTime[] - tt0 >> >> Print["Compilation: ", Developer`PackedArrayQ@Sol] >> >> (* Plotting *) >> tList = Table[1. t0 + (tMax - t0) k/n, {k, 0, n}]; >> si\[Alpha]List[\[Alpha]_, i_] := >> Table[Sol[[k]][[\[Alpha], i]], {k, 0, n}]; >> s1List = Transpose[{tList, si\[Alpha]List[3, 1]}]; >> s2List = Transpose[{tList, si\[Alpha]List[3, 40]}]; >> s3List = Transpose[{tList, si\[Alpha]List[3, 80]}]; >> s4List = Transpose[{tList, si\[Alpha]List[3, NSpins]}]; >> ListPlot[{s1List, s2List, s3List, s4List}, >> PlotStyle -> {Blue, Green, Red, Orange}, PlotRange -> All] >> >> *************************************************************************= > ** ********** >> >> The code is very short and runs faster than all my previous codes for >> the same problem. There are some questions, however: >> >> 1) There is no difference in speed between uncompiled RHS and compiled >> cRHS. >> >> 2) On my laptop (Windows 7, 64 bit), compilation in Mathematica >> results in 0.070 execution time, whereas compilation in C (Microsoft >> Visual C++) is longer, 0.075. In all previous not fully vectorized >> versions of the code compiling in C gave a speed advantage by a factor >> 2-3. Is it a perfect code that cannot be improved by C or, for some >> reason, compilation in C does not work and the system returns to the >> Mathematica compiler? >> >> Oliver mentioned earlier that CopyTensor in the loop in the compiled >> code leads to slowdown. Here I do have CopyTensor in the loop. Could >> it be moved outside?? >> >> I will be very greatful for all comments, as usual. >> >> Dmitry > > I have run this program many times with > > RKComp = makeCompRK4[RHS]; > > and > > RKComp = makeCompRK4[cRHS]; > > and also with compilation in C or in Mathematica, and I see that there > is no execution time difference that would stand out of fluctuations. > Thus I suspect that in the code above compilation in C does not work > and there is a fallback to the Mathematica compiler. > > Dmitry > > I suspect that there is not overhead until the code reaches the library function that is eventually evaluated in either C or the WVM. A profile would show this. Oliver

**Re: Fully vectorized system of ODE's - any advantage of C?**

**Re: Re: simplification**

**Re: Fully vectorized system of ODE's - any advantage of C?**

**Re: Compilation: Avoiding inlining**