Re: Re: Division still cost more than multiplication?

• To: mathgroup at smc.vnet.net
• Subject: [mg24422] Re: [mg24390] Re: Division still cost more than multiplication?
• From: Daniel Lichtblau <danl at wolfram.com>
• Date: Tue, 18 Jul 2000 00:58:20 -0400 (EDT)
• References: <8k0tc4\$q2o@smc.vnet.net> <8k3o2q\$418@smc.vnet.net> <200007130313.XAA14017@smc.vnet.net>
• Sender: owner-wri-mathgroup at wolfram.com

Richard Fateman wrote:
>
> I don't know why you are particularly optimizing the time for a
> numeric calculation in Mathematica, since if you were concerned
> about the time, you should be using something like Fortran.
>
> In high-end computer systems, the time to do a multiply or
> a divide is essentially irrelevant, since the time to get the
> data from memory and back to memory is much greater.  Some systems
> may several arithmetic units and are quite happy to do several
> operations (adds, multiplies, divides, square roots) at the same
> time.
>
> The observation regarding Mathematica is undoubtedly unrelated to the
> time taken to do a floating-point division.  It is probably related to
> associated
> processing, maybe having to do with simplification, or error checking
> (checking for division by zero, perhaps?)
> RJF
>
> AES wrote:
> >
> > In article <8k0tc4\$q2o at smc.vnet.net>, AES <siegman at stanford.edu> wrote:
> >
> > > Old-time FORTRAN programmers (like me) were taught (at least in early
> > > days) that division cost a lot more machine cycles than multiplication.
> > > So, if you had an expression like  y = x/c  that was going to be called
> > > many times inside a loop, where x and y were variables and c a constant,
> > > you'd code this as:
> >
> >    [snip]
> >
> > > Does this still make any sense in Mathematica?  Or is it a primitive
> > > relic of long-gone days?
> > +++++++++++++++++++++++++++++++++++++++++++++
> >
> > Anticipating replies to my query that I believe are coming from P. J.
> > Hinton and Daniel Lichtbau ((thanks to both), here is the same test as
> > P. J. Hintons, performed on a 1998-model 400 MHz Mac PowerBook G3,
> > giving very similar results:
> >
> >    In[7]:= {num, den} = {Random[], Random[]}
> >
> >    Out[7]= {0.0991313, 0.0559147}
> >
> >    In[8]:= recip = 1.0 / den
> >
> >    Out[8]= 17.8844
> >
> >    In[9]:= Timing[Do[num/den, {2000000}]]
> >
> >    Out[9]= {28.8 Second, Null}
> >
> >    In[10]:= {num, den} = {Random[], Random[]}
> >
> >    Out[10]= {0.132828, 0.28853}
> >
> >    In[11]:= recip = 1.0 / den
> >
> >    Out[11]= 3.46585
> >
> >    In[12]:= Timing[Do[num * recip, {2000000}]]
> >
> >    Out[12]= {14.6833 Second, Null}

As some have already noted, tests indicate that it is generally better
to precompute the reciprocal. I replicate some experiments below and
also endeavor to explain some of the finer points. Unfortunately I am
not fully prepared to explain in full detail all the timings shown,
except to guess that some types of loop overhead are better optimized in
the Mathematica evaluator than others. This alone is of interest,
however, because the first method shown indicates how one might come
closer to the speed of compiled Fortran or C code for certain types of
iterative numeric computations.

First I'll point out that speed depends to some extent on the nature of
the operands. If one is dividing bignums, and the divisor never changes,
then it is always beneficial to precompute the reciprocal and multiply.
This is because software implementation of division is typically between
three and two times slower than multiplication, depending on the number
of digits in the operands (this is independent of relative processor
speeds of machine arithmetic multiplication vs division).

For machine numbers alot depends on how you structure the Mathematica
code. As Richard Fateman points out, one might (perhaps unintentionally)
spend most of ones time in loop and other evaluator overhead. For
examples I'll illustrate with a table of distinct numerators. The first
timing indicates that my machine/platform is somewhat slower that used
in the note quoted above.

In[25]:= Timing[Do[num*recip, {2000000}]]
Out[25]= {22.28 Second, Null}

In[1]:= nums = Table[Random[], {1000000}];

In[2]:= den = Random[];

In[4]:= Timing[quots = nums/den;]
Out[4]= {0.85 Second, Null}

This is alot faster than using a loop because, internally, the
Mathematica evaluator sees it is doing a division operation on a packed
array (in version 4, that is) and it is processed by fast C code that in
effect bypasses the evaluator (provided no overflow occurs at
intermediate steps).

Now we repeat but in a way that forces us to evaluate an iterator and
explicitly fetch entries from the packed array. This is alot slower.

In[6]:= Timing[quots2 = Table[nums[[i]]/den, {i,Length[nums]}];]
Out[6]= {8.78 Second, Null}

In[7]:= quots == quots2
Out[7]= True

We can regain some speed by avoiding explicit use of an iterator:

In[20]:= Timing[quots3 = Map[#/den&, nums];]
Out[20]= {3.7 Second, Null}

The first tow methods do not improve by using a reciprocal.

In[10]:= recip = 1/den;

In[11]:= Timing[quots3 = nums*recip;]
Out[11]= {0.84 Second, Null}

In[12]:= Timing[quots4 = Table[nums[[i]]*recip, {i,Length[nums]}];]
Out[12]= {8.38 Second, Null}

In[21]:= Timing[quots4 = Map[#*recip&, nums];]
Out[21]= {2.12 Second, Null}

The nums/den example most likely inverts den once since Mathematica
parses a/b as a*Power[b,-1]. This would clearly explain why the first
method does not improve using an explicit reciprocal. Offhand I do not
know why the second example does not seem to care; most likely it
evaluates its first argument in the Table-handling code and each step
through the iteration thereafter sees nums[[i]*(evaluated reciprocal of
den). Clearly the third method is computing 1/den every time, and that
would explain why it gets a bit faster when we use an explicit
reciprocal.

Daniel Lichtblau
Wolfram Research

• Prev by Date: Re: fade to mauve
• Next by Date: Re: Book quality layout in vers 4 (towards Quark)
• Previous by thread: Re: Division still cost more than multiplication?
• Next by thread: Findminimum Question