Re: Where's the Speed?

*To*: mathgroup at smc.vnet.net*Subject*: [mg19848] Re: Where's the Speed?*From*: Robert Knapp <rknapp at wolfram.com>*Date*: Sun, 19 Sep 1999 01:20:33 -0400*Sender*: owner-wri-mathgroup at wolfram.com

Kevin J. McCann wrote: > I have read and heard a lot of hype about Mathematica 4.0 > > "featuring a > New Generation of > Fast Numerics" > > e.g. on www.wri.com . > > However, in the real world it is hard to find anything to get excited about. > I have had several disappointing results from 4.0, here are the latest: > > On a P450/NT I ran a very simple Crank-Nicholson integration of a > one-dimensional quantum free-particle wave-packet with x-dimensioned to 1001 > points and time to 401 points - no error checks or adaptive stepsize, just > plug-and-chug. Runtime under 4.0 was 501 seconds, 3.0 was 480 seconds, but > who's quibbling; however, in FORTRAN this ran in 8 seconds. I don't even > consider this to be "fast numerics", but 501 seconds sure isn't. I would > like to know where all this speed is so I can use some of it, or am I > missing something? In this case yes. I appreciate your sending me your notebook so I could work with the problem. I reworked the basic scheme slightly and reduced the time from the nearly 500 seconds to 1.6 seconds. Below I include the crucial parts of my solution. with Nx = 1001, Nt = 401 The original code was (notation changed for better newsgroup readability) offDiag = Table[-rho, {j, 2, Nx - 2}]; diag = Table[2 + 2 rho, {j, 2, Nx - 1}]; Timing[For[j = 2, j <= Nt, chi = rho (Take[U[[j - 1]], {3, Nx}] + Take[U[[j - 1]], {1, Nx - 2}]) + (2 - 2 rho)Take[U[[j - 1]], {2, Nx - 1}]; u = TridiagonalSolve[offDiag, diag, offDiag, chi]; AppendTo[u, 0.0]; PrependTo[u, 0.0]; AppendTo[U, u]; j++; ] ] {468.51 Second, Null} Here is mycode: spamlu = Experimental`ExtendedLinearSolve[ Join[{Rule[{1, 1}, 1], Rule[{Nx, Nx}, 1]}, Flatten[ Table[{Rule[{j, j - 1}, -rho], Rule[{j, j}, 2 + 2rho], Rule[{j, j + 1}, -rho]}, {j, 2, Nx - 1}]]]] Crank2[u_] := Block[{chi, v = u, n}, chi = rho(RotateRight[v] + RotateLeft[v]) + (2 - 2rho)v; chi[[{1, -1}]] = 0. + 0. I; spamlu[chi]] Timing[U2 = NestWhileList[Crank2, U[[1]], True &, 1, Nt - 1];] {1.69 Second, Null} There are two important features of the new code worth mentioning, both involving features new to version 4 First, the use of NestWhileList. If you have a loop which uses Append to accumulate a result, Mathematica has to make an extra copy of the whole result each time through the loop. If you loop many times, this can have a big effect on your overall timing. For this 401 steps with this problem, it is not a big issue, but if you want to try a much longer time integration, it could well be a factor. Here is a simple example which illustrates the problem. Each command is producing the equivalent of Range[n]. In[1]= Table[n = 2^k; {n, First[Timing[list = {1}; Do[next = Last[list] + 1; AppendTo[list, next], {n - 1}];]], First[Timing[NestWhileList[(# + 1) &, 1, True &, 1, n - 1]]]}, {k, 7, 15}] Out[1]= {{128, 0.01 Second, 0. Second}, {256, 0.01 Second, 0.01 Second}, {512, 0.03 Second, 0.01 Second}, {1024, 0.09 Second, 0.02 Second}, {2048, 0.24 Second, 0.06 Second}, {4096, 0.84 Second, 0.13 Second}, {8192, 3.83 Second, 0.26 Second}, {16384, 25.6 Second, 0.54 Second}, {32768, 139.19 Second, 1.07 Second}} Take note that while the timing for NestWhileList grows linearly with size, as it should, the timing with using Append grows quadratically -- this is because of all the copying Mathematica has to do. ----- The key feature is the use of a sparse LU decomposition to handle the solution of the linear equations involved. If you look up in the version 4 Mathematica book under Developer`SparseLinearSolve, you will see that something like {{1,1}->a,{1,2}->b,{2,2}->c}} is a sparse representation for a matrix {{a,b},{0,c}}. The basic representation is {row,col}->value: anything not explicitly given is implicitly assumed to be 0. Experimental`ExtendedLinearSolve is an externed version of LinearSolve which recognizes this (as well as the usual format) and has the ability to generate an LU decomposition in an easily usable form. Here is a simple example: In[2]:= full = {{1., 2.}, {0, -1.}}; spar = {{1, 1} -> 1., {1, 2} -> 2., {2, 2} -> -1.}; LinearSolve[full, {1, 2}] === Experimental`ExtendedLinearSolve[spar, {1, 2}] Out[4]= True But in the code above I have used an additional feature: If you give Experimental`ExtendedLinearSolve only one argument (with either full or sparse representation), it computes an LU decomposition and packages it a functional form In[5]:= splu = Experimental`ExtendedLinearSolve[spar] Out[5]= LinearSolveFunction[{2, 2}, "<>"] This is a function which operates on vectors of length 2 (or 2 by m matrices) and gives the solution In[6]:= splu[{1, 2}] Out[6]= {5., -2.} The sparse LU feature is underdocumented, but that is for the reason that it is Experimental` code which may change as we settle on the best model for handling sparse matrices. ----- Finally I will mention that even if the sparse LU feature was not there, there are other ways to optimzie this problem. One is to work with the TriDiagonalSolve to have it work with the Mathematica compiler. The package was written before the compiler could work with lists, and needs to be updated to accuont for this. Rob Knapp Numerics Development Wolfram Research, Inc. > > > When I ran the FORTRAN code and then read it into Mathematica, it still was faster > than the Mathematica code alone, although the ReadList I used on the ASCII > file did take some time. When I upped the dimensions to 2001 x 1001, the > FORTRAN ran in about 40 s - most of this is for the output; however, when I > tried to read it in to Mathematica, it took forever, and on the subsequent plot it > bombed the Kernel. I then made the mistake of saving the NB which managed > to acquire an error so that I can't open it anymore - 6 Mb of useless NB! > > -- > > Kevin J. McCann > Johns Hopkins University APL