MathGroup Archive: September 1999 [00260]

[Date Index] [Thread Index] [Author Index]
Re: Where's the Speed?
To: mathgroup at smc.vnet.net
Subject: [mg19848] Re: Where's the Speed?
From: Robert Knapp <rknapp at wolfram.com>
Date: Sun, 19 Sep 1999 01:20:33 -0400
Sender: owner-wri-mathgroup at wolfram.com
Kevin J. McCann wrote:

> I have read and heard a lot of hype about Mathematica 4.0
>
> "featuring a
>      New Generation of
>                     Fast Numerics"
>
> e.g. on www.wri.com .
>
> However, in the real world it is hard to find anything to get excited about.
> I have had several disappointing results from 4.0, here are the latest:
>
> On a P450/NT I ran a very simple Crank-Nicholson integration of a
> one-dimensional quantum free-particle wave-packet with x-dimensioned to 1001
> points and time to 401 points -  no error checks or adaptive stepsize, just
> plug-and-chug.  Runtime under 4.0 was 501 seconds, 3.0 was 480 seconds, but
> who's quibbling; however, in FORTRAN this ran in 8 seconds.  I don't even
> consider this to be "fast numerics", but 501 seconds sure isn't. I would
> like to know where all this speed is so I can use some of it, or am I
> missing something?

In this case yes.  I appreciate your sending me your notebook so I
could work with the problem.

I reworked the basic scheme slightly and reduced the time from the
nearly 500 seconds to 1.6 seconds.  Below I include the crucial parts
of my solution.

with Nx = 1001, Nt = 401

The original code was (notation changed for better newsgroup readability)

offDiag = Table[-rho, {j, 2, Nx - 2}];
diag = Table[2 + 2 rho, {j, 2, Nx - 1}];

Timing[For[j = 2, j <= Nt,
    chi =
      rho (Take[U[[j - 1]], {3, Nx}] + Take[U[[j - 1]], {1, Nx - 2}]) + (2 -
              2 rho)Take[U[[j - 1]], {2, Nx - 1}];
    u = TridiagonalSolve[offDiag, diag, offDiag, chi];
    AppendTo[u, 0.0];
    PrependTo[u, 0.0];
    AppendTo[U, u];
    j++;
    ]
  ]

{468.51 Second, Null}

Here is mycode:

spamlu = Experimental`ExtendedLinearSolve[
    Join[{Rule[{1, 1}, 1], Rule[{Nx, Nx}, 1]},
      Flatten[
        Table[{Rule[{j, j - 1}, -rho], Rule[{j, j}, 2 + 2rho],
            Rule[{j, j + 1}, -rho]}, {j, 2, Nx - 1}]]]]

Crank2[u_] :=
  Block[{chi, v = u, n},
    chi = rho(RotateRight[v] + RotateLeft[v]) + (2 - 2rho)v;
     chi[[{1, -1}]] = 0. + 0. I;
    spamlu[chi]]

Timing[U2 = NestWhileList[Crank2, U[[1]], True &, 1, Nt - 1];]

{1.69 Second, Null}

There are two important features of the new code worth mentioning, both
involving features new to version 4

First, the use of NestWhileList.  If you have a loop which uses Append
to accumulate a result,  Mathematica has to make an extra copy of the
whole result each time through the loop. If you loop many times, this
can have a big effect on your overall timing.  For this 401 steps with
this problem, it is not a big issue, but if you want to try a much
longer time integration, it could well be a factor.

Here is a simple example which illustrates the problem.  Each command
is producing the equivalent of Range[n].

In[1]=
Table[n = 2^k; {n,
    First[Timing[list = {1};
        Do[next = Last[list] + 1; AppendTo[list, next], {n - 1}];]],
    First[Timing[NestWhileList[(# + 1) &, 1, True &, 1, n - 1]]]},
  {k, 7, 15}]

Out[1]=
{{128, 0.01 Second, 0. Second}, {256, 0.01 Second, 0.01 Second}, {512,
    0.03 Second, 0.01 Second}, {1024, 0.09 Second, 0.02 Second}, {2048,
    0.24 Second, 0.06 Second}, {4096, 0.84 Second, 0.13 Second}, {8192,
    3.83 Second, 0.26 Second}, {16384, 25.6 Second, 0.54 Second}, {32768,
    139.19 Second, 1.07 Second}}

Take note that while the timing for NestWhileList grows linearly with
size, as it should, the timing with using Append grows quadratically --
this is because of all the copying Mathematica has to do.

-----

The key feature is the use of a sparse LU decomposition to handle the
solution of the linear equations involved.  If you look up in the
version 4 Mathematica book under Developer`SparseLinearSolve, you will
see that something like

{{1,1}->a,{1,2}->b,{2,2}->c}} is a sparse representation for a matrix
{{a,b},{0,c}}.  The basic representation is {row,col}->value: anything
not explicitly given is implicitly assumed to be 0.

Experimental`ExtendedLinearSolve is an externed version of LinearSolve
which recognizes this (as well as the usual format) and has the ability
to generate an LU decomposition in an easily usable form.  Here is a
simple example:

In[2]:=
full = {{1., 2.}, {0, -1.}};
spar = {{1, 1} -> 1., {1, 2} -> 2., {2, 2} -> -1.};
LinearSolve[full, {1, 2}] === Experimental`ExtendedLinearSolve[spar, {1, 2}]

Out[4]=
True

But in the code above I have used an additional feature:  If you give
Experimental`ExtendedLinearSolve only one argument (with either full or sparse
representation), it computes an LU decomposition and packages it a functional form

In[5]:=
splu = Experimental`ExtendedLinearSolve[spar]

Out[5]=
LinearSolveFunction[{2, 2}, "<>"]

This is a function which operates on vectors of length 2 (or 2 by m
matrices) and gives the solution

In[6]:=
splu[{1, 2}]

Out[6]=
{5., -2.}

The sparse LU feature is underdocumented, but that is for the reason that it is
Experimental` code which may change as we settle on the best model for handling
sparse matrices.

-----

Finally I will mention that even if the sparse LU feature was not
there, there are other ways to optimzie this problem.  One is to work
with the TriDiagonalSolve to have it work with the Mathematica
compiler.  The package was written before the compiler could work with
lists, and needs to be updated to accuont for this.

Rob Knapp
Numerics Development
Wolfram Research, Inc.

>
>
> When I ran the FORTRAN code and then read it into Mathematica, it still was faster
> than the Mathematica code alone, although the ReadList I used on the ASCII
> file did take some time.  When I upped the dimensions to 2001 x 1001, the
> FORTRAN ran in about 40 s - most of this is for the output; however, when I
> tried to read it in to Mathematica, it took forever, and on the subsequent plot it
> bombed the Kernel.  I then made the mistake of saving the NB which managed
> to acquire an error so that I can't open it anymore - 6 Mb of useless NB!
>
> --
>
> Kevin J. McCann
> Johns Hopkins University APL
Prev by Date: Re: F[f_,x_]:=f[x] ?
Next by Date: Re: AxesLabel touches the y-Axis
Previous by thread: Re: Where's the Speed?
Next by thread: Re: Where's the Speed?