MathGroup Archive: April 1997 [00116]

[Date Index] [Thread Index] [Author Index]
Re: Strange Compile Results
To: mathgroup at smc.vnet.net
Subject: [mg6857] Re: [mg6829] Strange Compile Results
From: David Withoff <withoff>
Date: Thu, 24 Apr 1997 02:44:35 -0400 (EDT)
Sender: owner-wri-mathgroup at wolfram.com
> I am trying to compile a function that operates on a long data list (all
> real values) to find the next maxima, given a certain starting point (it is a
> function that is called many times by another function).  To speed it up, I
> have been attempting to compile it, and have had some strange results.  Here
> is the function:
> 
> Findnextmax = Compile[{{list, _Real, 1}, {startpoint, _Integer}},
> Module[{i=0, total=0 ,start=0,end=0},
>         {total = Length[list]-1;
>         If[startpoint >= total, Return[startpoint]];
>         i = startpoint;
>          While[(i < (total) && (Part[list, i] >= Part[list, i+1])), i += 1]; 
>          While[(i < (total) && (Part[list,i] < Part[list, i+1])), i += 1];
>         If[Part[list,i]==Part[list,i+1],
>                 start = i;
>                 While[(i < total) && (Part[list,i] == Part[list,i+1]), i+=1]; 
>                 end = i;
>                 i = Round[(start + end)/2]]; i],
>  {{Round[_], _Integer}}]
> 
> (The function basically does this: go downhill if possible, then uphill
> until the next maxima is reached.  If it is flat on top, return the midpoint
> of the flat section.)
> 
> Now, given a long linear array of real values, I compare timings on 100
> calls of this function versus 100 calls of the same function uncompiled(using
> Timing[Do[Findnextmax[data, 14], {100}]]).  I get
> Compiled 1.09s
> Uncompiled: 0.37s
> 
> This was surprising.  So I played with commenting out various parts of the
> function to see which was responsible for the slowdown.  Surprisingly, a big
> factor was the very last assigment statement (which originally used Floor
> instead of Round).  I played with it.  The only thing that worked to speed it
> up was Changing the '=' to ':=', which almost doubles the speed of the
> compiled function!!!  The new timing for the compiled function: 0.68s.
> Stranger yet, if I change a different assignment, say "total :=
> Length[list]-1" from '=' to ':=', I get a big slowdown again!  So it appears
> to be inconsistent.  There seems to be something going on with assignments
> and compilation that I don't fully understand here.
> 
> The other thing to be noticed is that even with the speedup provided by the  
> "correct" use of '=' and ':=', the compiled function is still almost 2x  
> slower in 100 calls.  To see whether this was due to parameter passing  
> overhead, I moved the 100 loops inside the function.  Now if I compare the  
> compiled to the uncompiled function, I get:
> Compiled: 0.01s
> Uncompiled: 0.15s
> 
> So, the remaining speed difference was due to something in function
> call/parameter passing.  The only guess I had is there is some overhead
> involved in converting this list into the appropriate form for use by
> compiled functions.
> 
> To test this, I compiled the external loop which calls the function, i.e.
> test = Compile[{{mydata, _Real, 1}, {loops, _Integer}},
>         Do[Findnextmax[mydata,14], {loops}], {{Findnextmax[_], _Integer}}]
> 
> Results using this compiled function to call compiled vs. uncompiled
> versions of Findnextmax using 100 loops(e.g. Timing[a[data, 100];]):
> Compiled: 0.73 s
> Uncompiled: 0.38 s
> 
> So compiling the calling function to first get the list into appropriate
> form before iterating the call has virtually no speedup effect.
> 
> After all of this I can't think of anything else to make it run faster,
> except putting it inline and compiling the whole calling function.  The
> problem is, the calling code is already large and messy, so that really isn't
> a viable option.
>  
> Can anyone explain the following two strange effects I have observed?
> 1) 2x overhead calling a compiled function versus uncompiled
> 2) location dependence of compiled function speed on use  of '=' and ':=' in
> assigment
> 
> I just recently subscribed to the group and haven't recieved confirmation  
> yet, so please cc: me at giddings at whitewater.chem.wisc.edu.
> 
> Thanks
> ---
> Michael Giddings
> giddings at whitewater.chem.wisc.edu
> giddings at barbarian.com
> (608)258-1699 or (608) 692-2851
> http://smithlab.chem.wisc.edu/PersonalPages/giddings/giddings.html
> http://www.barbarian.com

Explanations for unexpected timing differences between compiled and
uncompiled evaluation fall into one of two categories:

1) The CompiledFunction interpreter is using uncompiled evaluation;
2) The compiled and uncompiled functions aren't really equivalent.

If the CompiledFunction interpreter is using uncompiled evaluation,
then it would have been faster to call uncompiled evaluation in the
first place, and to avoid the time-consuming detour through the
CompiledFunction interpreter.  That much is obvious.  The hard part
is to find out that that is what is happening, and then to figure
out why it is happening, and what to do about it.

The easiest way that I know of to find out that a CompileFunction
expression contains calls to uncompiled evaluation is to at the
list of pseudocode instructions in the CompiledFunction expression.
Pseudocode instructions 31 and 32 are calls to uncompiled evaluation.
If you are like me, and have a hard time remembering these sorts of
details, just remember to look at the internal structure of the
CompileFunction expression.  The calls to uncompiled evaluation are
usually pretty conspicuous.

Here is what I found when I tried your example.  The list of pseudocode
instructions is the third element in the CompiledFunction expression.

In[3]:= Findnextmax[[3]]

Out[3]= {{1, 2}, {6, 1, 1, 3, 0}, {3, 2, 0}, {14, 0, 1}, {14, 0, 2}, 
 
>    {14, 0, 3}, {14, 0, 4}, {107, 0, 5}, {14, -1, 6}, {33, 5, 6, 5}, 
 
>    {19, 5, 2}, {99, 2, 0, 0}, {90, 0, 3}, {8, 0}, {91, 1}, {19, 0, 1}, 
 
>    {97, 1, 2, 1}, {114, 0, 1, 0}, {14, 1, 5}, {33, 1, 5, 5}, 
 
>    {114, 0, 5, 1}, {100, 1, 0, 2}, {101, 1, 2, 3}, {90, 3, 5}, {14, 1, 6}, 
 
>    {33, 1, 6, 6}, {19, 6, 1}, {91, -11}, {97, 1, 2, 4}, {114, 0, 1, 2}, 
 
>    {14, 1, 6}, {33, 1, 6, 6}, {114, 0, 6, 3}, {98, 2, 3, 5}, 
 
>    {101, 4, 5, 6}, {90, 6, 5}, {14, 1, 7}, {33, 1, 7, 7}, {19, 7, 1}, 
 
>    {91, -11}, {114, 0, 1, 4}, {14, 1, 7}, {33, 1, 7, 7}, {114, 0, 7, 5}, 
 
>    {94, 4, 5, 0, 7}, {31, Function[{list, startpoint}, 
 
>      If[list[[i]] == list[[i + 1]], 
 
>       start = i; While[i < total && list[[i]] == list[[i + 1]], i += 1]; 
 
                            start + end
>        end = i; i = Round[-----------]]], {end, 2, 0, 4, Module}, 
                                 2
 
>     {start, 2, 0, 3, Module}, {total, 2, 0, 2, Module}, 
 
>     {i, 2, 0, 1, Module}, 6, 0, 17}, {8, 1}}

All that Function[{list, startpoint}, etc., etc., etc. stuff in the
middle of an otherwise innoccuous collection of integers is the call
to uncompiled evaluation.

The next step is to figure out why the CompiledFunction expression
contains a call to uncompiled evaluation, and to figure out what to
do about it.  In general, there are dozens of possible explanations.
The explanation that applies in this particular example is that this
is basically a deficiency of the Compile function.  Since the Compile
function has no way of knowing the outcome of list[[i]] == list[[i + 1]],
it has no way of knowing whether the return value of the enclosing If
expression will be an integer or Null.  The deficiency is that the
Compile function isn't smart enough to figure out that it doesn't
need to know the return value of the If expression.  Your program doesn't
use that return value.  In any case, when the Compile function doesn't
know what to do with an expression, it calls uncompiled evaluation.  That
is what happens in your example.

If you use a delayed assignment in i := Round[(start + end)/2], then
the return value of the If expression will always be Null, since the
return value of the delayed assignment is Null.  This will allow the
Compile function to compile the If expression, but it still won't know
what to do with the delayed assignment.  You can get an even better
effect by putting a semicolon after the immediate assignment, which
also causes the return value of the If expression to be Null.

 Findnextmax = Compile[{{list, _Real, 1}, {startpoint, _Integer}},
 Module[{i=0, total=0 ,start=0,end=0},
         total = Length[list]-1;
         If[startpoint >= total, Return[startpoint]];
         i = startpoint;
          While[(i < (total) && (Part[list, i] >= Part[list, i+1])), i += 1];
          While[(i < (total) && (Part[list,i] < Part[list, i+1])), i += 1];
         If[Part[list,i]==Part[list,i+1],
                 start = i;
                 While[(i < total) && (Part[list,i] == Part[list,i+1]), i+=1];
                 end = i;
                 i = Round[(start + end)/2];]; i],
  {{Round[_], _Integer}}]

After this change, I get

In[13]:= Findnextmax[[3]]

Out[13]= {{1, 2}, {6, 1, 1, 3, 0}, {3, 2, 0}, {14, 0, 1}, {14, 0, 2}, 
 
>    {14, 0, 3}, {14, 0, 4}, {107, 0, 5}, {14, -1, 6}, {33, 5, 6, 5}, 
 
>    {19, 5, 2}, {99, 2, 0, 0}, {90, 0, 3}, {8, 0}, {91, 1}, {19, 0, 1}, 
 
>    {97, 1, 2, 1}, {114, 0, 1, 0}, {14, 1, 5}, {33, 1, 5, 5}, 
 
>    {114, 0, 5, 1}, {100, 1, 0, 2}, {101, 1, 2, 3}, {90, 3, 5}, {14, 1, 6}, 
 
>    {33, 1, 6, 6}, {19, 6, 1}, {91, -11}, {97, 1, 2, 4}, {114, 0, 1, 2}, 
 
>    {14, 1, 6}, {33, 1, 6, 6}, {114, 0, 6, 3}, {98, 2, 3, 5}, 
 
>    {101, 4, 5, 6}, {90, 6, 5}, {14, 1, 7}, {33, 1, 7, 7}, {19, 7, 1}, 
 
>    {91, -11}, {114, 0, 1, 4}, {14, 1, 7}, {33, 1, 7, 7}, {114, 0, 7, 5}, 
 
>    {94, 4, 5, 0, 7}, {90, 7, 26}, {19, 1, 3}, {97, 1, 2, 8}, 
 
>    {114, 0, 1, 6}, {14, 1, 8}, {33, 1, 8, 8}, {114, 0, 8, 7}, 
 
>    {94, 6, 7, 0, 9}, {101, 8, 9, 10}, {90, 10, 5}, {14, 1, 9}, 
 
>    {33, 1, 9, 9}, {19, 9, 1}, {91, -11}, {19, 1, 4}, {15, 0.5, 8}, 
 
>    {33, 3, 4, 9}, {14, 2, 10}, {24, 10, 9}, {45, 9, 10}, {24, 9, 11}, 
 
>    {38, 11, 10, 9}, {34, 8, 9, 8}, {83, 8, 9}, {19, 9, 1}, {91, 1}, {8, 1}}

This is what you want to see: a nice, clean collection of pseudocode
instructions, with no calls to uncompiled evaluation.

Returning again to more general issues, the overhead that you observed
associated with sending arguments to a CompiledFunction expression is
something that I would put in my second category of explanations for 
unexpected speed differences between compiled and uncompiled evaluation:
the compiled and uncompiled functions aren't really equivalent.

The CompiledFunction interpreter does indeed convert the input expression
into an internal form (an array of numbers).  This will speed up
subsequent operations, but the conversion takes some time.  Here is
an example of a program that takes an array as an argument, but that
doesn't do anything with the array, and instead just returns zero.

In[14]:= Clear[f, fc]

In[15]:= f[m_] = 0

Out[15]= 0

In[16]:= fc = Compile[{{m, _Real, 1}}, 0]

Out[16]= CompiledFunction[{m}, 0, -CompiledCode-]

In[17]:= data = Table[1., {1000000}];

In[18]:= Timing[f[data]]

Out[18]= {0. Second, 0}

In[19]:= Timing[fc[data]]

Out[19]= {4.09344 Second, 0}

As you can see, compiled evaluation takes a lot longer in this example
than uncompiled evaluation, for the simple reason that it takes a certain
amount of time to check that the argument is an array of real numbers,
and to allocate space for 1000000 real numbers.

This isn't really a fair comparison, since these programs aren't really
equivalent.  The definition f[m_] = 0 doesn't even check to see that
the argument is a list of real numbers.  A more fair comparison is with
a definition that includes a pattern which checks to see that the argument
is a list of real numbers.  With this change, the time for compiled
evaluation and the time for uncompiled evaluation are comparable, since
both operations spend most of their time checking to see that the argument
is a list of real numbers.

In[20]:= fp[m:{___Real}] = 0

Out[20]= 0

In[21]:= Timing[fp[data]]

Out[21]= {4.57812 Second, 0}

Finally, regarding your experiment with

 test = Compile[{{mydata, _Real, 1}, {loops, _Integer}},
         Do[Findnextmax[mydata,14], {loops}], {{Findnextmax[_], _Integer}}]

this may not be doing quite wanted it to do.  For the outer Compile,
the function Findnextmax is just another unknown function, which will
be handled by calling uncompiled evaluation.  Your test function will
convert mydata into an internal array of real numbers, but it will either
convert it back into an expression (or it will use the original expression,
I can't remember which) for the purpose of calling Findnextmax in an
uncompiled evaluation.  The evaluator will subsequently realize that
Findnextmax is a CompiledFunction expression, and will call the
CompiledFunction interpreter, but there is no reduction in the time
spent doing conversions.

These and other notes about the Compile function will soon be included
in the technical support section of the Wolfram Research web site
(http://www.wolfram.com/support), and will probably eventually be
available as a technical report, or in other documentation.

Dave Withoff
Wolfram Research
Prev by Date: possible bug in Cases/pattern matching??
Next by Date: Re: Help with Graphics
Previous by thread: Strange Compile Results
Next by thread: Bug in Coefficient function?