Re: Fully vectorized system of ODE's - any advantage of C?

*To*: mathgroup at smc.vnet.net*Subject*: [mg121969] Re: Fully vectorized system of ODE's - any advantage of C?*From*: Oliver Ruebenkoenig <ruebenko at wolfram.com>*Date*: Fri, 7 Oct 2011 04:50:07 -0400 (EDT)*Delivered-to*: l-mathgroup@mail-archive0.wolfram.com

On Tue, 4 Oct 2011, DmitryG wrote: > Dear All, > > I was working on solving systems of a big number of ODE's with > vectorization and compilation in C and I have received a great support > in this forum. The speed crucially depends on the details of > programming that not always can be detected by an average user (such > as myself). > > I have realized that my vectorized codes were not yet completely > vectorized because calls have been made to components of tensors. Then > I've got an idea of a code that is completely vectorized, for a model > of classical spins that is one of the subjects of my work. Here is the > problem that, I believe, can be of interest to many Mathematica > users. > > Dynamics of a chain of classical spins (described by three-component > vectors of fixed length) precessing in the effective field created by > neighbors > > ds[i]/dt==s[i]\[Cross]H[i]; H[i]=s[i-1]+s[i+1]; > i=1,2,..., NSpins > > Both the vector product and interaction with neighbors in H[i] can be > very efficiently described by the RotateRight command. Putting all > three components of all NSpins spins into the double list > > s={ {s[x,1],s[x,2],..s[x,NSpins]}, {s[y,1],s[y,2],..s[y,NSpins]}, > {s[z,1],sz,2],..s[z,NSpins]} }, > > one can construct the code > > ************************************************************ > (* Runge-Kutta-4 routine *) > ClearAll[makeCompRK4, makeCompRK5] > makeCompRK4[f_] := > Compile[{{x0, _Real, 2}, {t0}, {tMax}, {n, _Integer}}, > Module[{h, K1, K2, K3, K4, SolList, x = x0, t}, h = (tMax - t0)/n; > SolList = Table[x0, {n + 1}]; > Do[t = t0 + k h; > K1 = h f[t, x]; > K2 = h f[t + (1/2) h, x + (1/2) K1]; > K3 = h f[t + (1/2) h, x + (1/2) K2]; > K4 = h f[t + h, x + K3]; > x = x + (1/6) K1 + (1/3) K2 + (1/3) K3 + (1/6) K4; > SolList[[k + 1]] = x, {k, 1, n}]; > SolList](*,CompilationTarget->"C"*), > CompilationOptions -> {"InlineCompiledFunctions" -> True}, > "RuntimeOptions" -> "Speed"] > > (* Defining equations *) > > RHS = Function[{t, s}, > RotateRight[ > s, {1, 0}] (RotateRight[s, {2, 1}] + RotateRight[s, {2, -1}]) - > RotateRight[ > s, {2, 0}] (RotateRight[s, {1, 1}] + RotateRight[s, {1, -1}])]; > > cRHS = Compile[{t, {s, _Real, 2}}, > RotateRight[ > s, {1, 0}] (RotateRight[s, {2, 1}] + RotateRight[s, {2, -1}]) - > RotateRight[ > s, {2, 0}] (RotateRight[s, {1, 1}] + RotateRight[s, {1, -1}])]; > > (*Compilation*) > tt0 = AbsoluteTime[]; > Timing[RKComp = makeCompRK4[RHS];] > AbsoluteTime[] - tt0 > > (* Initial condition *) > x0 = { Join[{1}, Table[0, {i, 2, NSpins}], {0}], > Join[{0}, Table[0, {i, 2, NSpins}], {0}], > Join[{0}, > Table[1, {i, 2, > NSpins}], {0}] }; (* Padding with a zero spin at the NSpins+1 \ > position *) > > (* Parameters *) > NSpins = 120; t0 = 0; tMax = 150; n = 1000; > > (* Solving *) > tt0 = AbsoluteTime[]; > Sol = RKComp[x0, t0, tMax, n]; > AbsoluteTime[] - tt0 > > Print["Compilation: ", Developer`PackedArrayQ@Sol] > > (* Plotting *) > tList = Table[1. t0 + (tMax - t0) k/n, {k, 0, n}]; > si\[Alpha]List[\[Alpha]_, i_] := > Table[Sol[[k]][[\[Alpha], i]], {k, 0, n}]; > s1List = Transpose[{tList, si\[Alpha]List[3, 1]}]; > s2List = Transpose[{tList, si\[Alpha]List[3, 40]}]; > s3List = Transpose[{tList, si\[Alpha]List[3, 80]}]; > s4List = Transpose[{tList, si\[Alpha]List[3, NSpins]}]; > ListPlot[{s1List, s2List, s3List, s4List}, > PlotStyle -> {Blue, Green, Red, Orange}, PlotRange -> All] > > ************************************************************************************* > > The code is very short and runs faster than all my previous codes for > the same problem. There are some questions, however: > > 1) There is no difference in speed between uncompiled RHS and compiled > cRHS. Presumably because the code uses the same library code. > > 2) On my laptop (Windows 7, 64 bit), compilation in Mathematica > results in 0.070 execution time, whereas compilation in C (Microsoft > Visual C++) is longer, 0.075. In all previous not fully vectorized > versions of the code compiling in C gave a speed advantage by a factor > 2-3. Is it a perfect code that cannot be improved by C or, for some > reason, compilation in C does not work and the system returns to the > Mathematica compiler? Hard to say, probably some profiling would tell. > > Oliver mentioned earlier that CopyTensor in the loop in the compiled > code leads to slowdown. Here I do have CopyTensor in the loop. Could > it be moved outside?? > I do not think you can. If you want to try what performance impact this has you could look at the generated C code and eliminate/modify those CopyTensor lines and then recompile (e.g. by hand ) and see what difference it makes. Oliver > I will be very greatful for all comments, as usual. > > Dmitry > >