Fully vectorized system of ODE's - any advantage of C?
- To: mathgroup at smc.vnet.net
- Subject: [mg121839] Fully vectorized system of ODE's - any advantage of C?
- From: DmitryG <einschlag at gmail.com>
- Date: Tue, 4 Oct 2011 02:44:21 -0400 (EDT)
- Delivered-to: l-mathgroup@mail-archive0.wolfram.com
Dear All, I was working on solving systems of a big number of ODE's with vectorization and compilation in C and I have received a great support in this forum. The speed crucially depends on the details of programming that not always can be detected by an average user (such as myself). I have realized that my vectorized codes were not yet completely vectorized because calls have been made to components of tensors. Then I've got an idea of a code that is completely vectorized, for a model of classical spins that is one of the subjects of my work. Here is the problem that, I believe, can be of interest to many Mathematica users. Dynamics of a chain of classical spins (described by three-component vectors of fixed length) precessing in the effective field created by neighbors ds[i]/dt==s[i]\[Cross]H[i]; H[i]=s[i-1]+s[i+1]; i=1,2,..., NSpins Both the vector product and interaction with neighbors in H[i] can be very efficiently described by the RotateRight command. Putting all three components of all NSpins spins into the double list s={ {s[x,1],s[x,2],..s[x,NSpins]}, {s[y,1],s[y,2],..s[y,NSpins]}, {s[z,1],sz,2],..s[z,NSpins]} }, one can construct the code ************************************************************ (* Runge-Kutta-4 routine *) ClearAll[makeCompRK4, makeCompRK5] makeCompRK4[f_] := Compile[{{x0, _Real, 2}, {t0}, {tMax}, {n, _Integer}}, Module[{h, K1, K2, K3, K4, SolList, x = x0, t}, h = (tMax - t0)/n; SolList = Table[x0, {n + 1}]; Do[t = t0 + k h; K1 = h f[t, x]; K2 = h f[t + (1/2) h, x + (1/2) K1]; K3 = h f[t + (1/2) h, x + (1/2) K2]; K4 = h f[t + h, x + K3]; x = x + (1/6) K1 + (1/3) K2 + (1/3) K3 + (1/6) K4; SolList[[k + 1]] = x, {k, 1, n}]; SolList](*,CompilationTarget->"C"*), CompilationOptions -> {"InlineCompiledFunctions" -> True}, "RuntimeOptions" -> "Speed"] (* Defining equations *) RHS = Function[{t, s}, RotateRight[ s, {1, 0}] (RotateRight[s, {2, 1}] + RotateRight[s, {2, -1}]) - RotateRight[ s, {2, 0}] (RotateRight[s, {1, 1}] + RotateRight[s, {1, -1}])]; cRHS = Compile[{t, {s, _Real, 2}}, RotateRight[ s, {1, 0}] (RotateRight[s, {2, 1}] + RotateRight[s, {2, -1}]) - RotateRight[ s, {2, 0}] (RotateRight[s, {1, 1}] + RotateRight[s, {1, -1}])]; (*Compilation*) tt0 = AbsoluteTime[]; Timing[RKComp = makeCompRK4[RHS];] AbsoluteTime[] - tt0 (* Initial condition *) x0 = { Join[{1}, Table[0, {i, 2, NSpins}], {0}], Join[{0}, Table[0, {i, 2, NSpins}], {0}], Join[{0}, Table[1, {i, 2, NSpins}], {0}] }; (* Padding with a zero spin at the NSpins+1 \ position *) (* Parameters *) NSpins = 120; t0 = 0; tMax = 150; n = 1000; (* Solving *) tt0 = AbsoluteTime[]; Sol = RKComp[x0, t0, tMax, n]; AbsoluteTime[] - tt0 Print["Compilation: ", Developer`PackedArrayQ@Sol] (* Plotting *) tList = Table[1. t0 + (tMax - t0) k/n, {k, 0, n}]; si\[Alpha]List[\[Alpha]_, i_] := Table[Sol[[k]][[\[Alpha], i]], {k, 0, n}]; s1List = Transpose[{tList, si\[Alpha]List[3, 1]}]; s2List = Transpose[{tList, si\[Alpha]List[3, 40]}]; s3List = Transpose[{tList, si\[Alpha]List[3, 80]}]; s4List = Transpose[{tList, si\[Alpha]List[3, NSpins]}]; ListPlot[{s1List, s2List, s3List, s4List}, PlotStyle -> {Blue, Green, Red, Orange}, PlotRange -> All] ************************************************************************************* The code is very short and runs faster than all my previous codes for the same problem. There are some questions, however: 1) There is no difference in speed between uncompiled RHS and compiled cRHS. 2) On my laptop (Windows 7, 64 bit), compilation in Mathematica results in 0.070 execution time, whereas compilation in C (Microsoft Visual C++) is longer, 0.075. In all previous not fully vectorized versions of the code compiling in C gave a speed advantage by a factor 2-3. Is it a perfect code that cannot be improved by C or, for some reason, compilation in C does not work and the system returns to the Mathematica compiler? Oliver mentioned earlier that CopyTensor in the loop in the compiled code leads to slowdown. Here I do have CopyTensor in the loop. Could it be moved outside?? I will be very greatful for all comments, as usual. Dmitry