MathGroup Archive 2003

[Date Index] [Thread Index] [Author Index]

Search the Archive

RE (long): Re: Increase in efficiency with Module

  • To: mathgroup at smc.vnet.net
  • Subject: [mg40143] RE (long): [mg40117] Re: Increase in efficiency with Module
  • From: "Wolf, Hartmut" <Hartmut.Wolf at t-systems.com>
  • Date: Sat, 22 Mar 2003 05:08:59 -0500 (EST)
  • Sender: owner-wri-mathgroup at wolfram.com

My dear friends,

yet another stroke of genius from Carl Woll. (If you want to have a look of
more, just scan the archive!) We earthlings, however, might question
ourselves: how only can we arrive at such an idea?

Perhaps it's helpful to show, in little, reason-able, steps how to earn 2
cents. (See below; as this thread has become rather long now, to make it
more readable, I've edited the line breaks, and snipped off most; I
apologize to Bobby Treat.)

>-----Original Message-----
>From: Carl K. Woll [mailto:carlw at u.washington.edu]
To: mathgroup at smc.vnet.net
>Sent: Friday, March 21, 2003 8:37 AM
>To: mathgroup at smc.vnet.net
>Subject: [mg40143] [mg40117] Re: Increase in efficiency with Module
>
>
>Hi,
>
>Since this thread seems to have transformed itself into a 
>speed contest, I thought I would throw in my 2 cents. 
>Here is my attempt:
>
>h[a_]:=(s+=Tr[Sign[a-First[a]]];Rest[a])
>
>carl[t_]:=Block[{s=0}, Nest[h,t,Length[t]]; s]
>
>
>Comparing to ftauc3PP (the fastest so far according to Bobby) 
>we have:
>
>test=Table[Random[],{1000}];
>
>In[105]:=
>ftauc3PP[test]//Timing
>carl[test]//Timing
>Out[105]=
>{0.265 Second, 9362}
>Out[106]=
>{0.094 Second, 9362}
>
>So, my uncompiled code is almost 3 times faster than ftauc3PP.
>
>Carl Woll
>Physics Dept
>U of Washington
>
[...]



>> >> On Tue, 18 Mar 2003 02:21:27 -0500 (EST), Wolf, Hartmut
><Hartmut.Wolf at t-systems.com> wrote:
>> >>
>> >>>
>> >>>> -----Original Message-----
>> >>>> From: Aaron E. Hirsh [mailto:aehirsh at stanford.edu]
To: mathgroup at smc.vnet.net
>To: mathgroup at smc.vnet.net
>> >>> To: mathgroup at smc.vnet.net
>> >>>> Sent: Monday, March 17, 2003 9:35 AM
>> >>>> To: mathgroup at smc.vnet.net
>> >>>> Subject: [mg40143] [mg40117]  Increase in efficiency with Module
>> >>>>
>> >>>>
>> >>>> I would be grateful if someone could explain the difference
>> >>>> in efficiency between the following two simple programs. 
>> >>>> ry is a list of length n. 
>> >>>> For n = 1000, the program using Module takes 
>> >>>> 1 second on my laptop, whereas the program
>> >>>> without Module takes 75 seconds.
>> >>>>
>> >>>> ftauc = Compile[{{ry, _Real, 1}, {n, _Real, 0}},
>> >>>> Module[{i, j, a}, i = 1; a = 0;
>> >>>> Do[ j = i + 1;
>> >>>>    Do[
>> >>>>       If[ry[[i]] < ry[[j]], a++,
>> >>>>          If[ry[[i]] > ry[[j]], a--]];
>> >>>>       j++, {n - i}];
>> >>>>    i++, {n - 1}]; a
>> >>>> ]]
>> >>>>
>> >>>> ftauc2 = Compile[{{ry, _Real, 1}, {n, _Real, 0}},
>> >>>> i = 1;
>> >>>> a = 0;
>> >>>> Do[j = i + 1;
>> >>>>    Do[
>> >>>>       If[ry[[i]] < ry[[j]], a++,
>> >>>>          If[ry[[i]] > ry[[j]], a--]];
>> >>>>       j++, {n - i}];
>> >>>>    i++, {n - 1}];
>> >>>>  a]
>> >>>>
>> >>>> thank you,
>> >>>> --
>> >>>>
>> >>>> Aaron E. Hirsh
>> >>>> Center for Computational Genetics and Biological Modeling
>> >>>> Department of Biological Sciences
>> >>>> Stanford University
>> >>>> tel. (650) 723-4952
>> >>>> fax. (650) 725-0180
>> >>>>
>> >>>
>> >>> Aaron,
>> >>>
>> >>> if you inspect the compiled code ftauc[[4]] and 
>> >>> ftauc2[[4]] you see considerable differences 
>> >>> which are obviously caused by access to 
>> >>> the global variables i, j and a (in ftauc).
>> >>> I cannot go through theses statements
>> >>> line by line, merely refer to
>> >>> http://library.wolfram.com/database/Conferences/4682/
>> >>>
>> >>> Although, this appears to be out of date now, a key 
>> >>> sentence therein still holds: 
>> >>> "The main obstacle to achieving a speedup ... is when the
>> >>> compiled code has to externally evaluate a function".
>> >>> From the compiled code for ftauc2, I suppose,
>> >>> that access and updates of the global variables
>> >>> i,j,a is done by calling an (external) function.
>> >>>
>> >>> Further: "In structures such as Module or Block, local 
>> >>> variables can be declared. The compiled code
>> >>> will store value of such a variable in a
>> >>> register, which can then be updated whenever the 
>> >>> variable gets a new value.
>> >>>
>> >>>
>> >>> Such use local variables!
>> >>>
>> >>>
[...]

[I had snipped this off, bring it back now:]
>> >>> 
>> >>> Functional code is better though:
>> >>> 
>> >>> Plus @@ Flatten[
>> >>>       MapThread[
>> >>>         Thread[Unevaluated[Order[#1, #2]]] &, 
>> >>>           {ry, NestList[Rest, ry, Length[ry] - 1]}]]
>> >>> 
[...]

 
In[3]:= test = Table[Random[], {1000}];
 
To find this code, lets look at the list operations first, in an abstracted
manner. Simply take this for a list:

In[4]:= r = MapIndexed[(-1)^First[#2] #1 &, Range[5]]
Out[4]= {-1, 2, -3, 4, -5}


Now what have we to process? We have to compare each element of the list
with any element following that list, i.e. with any element of 

In[5]:= rr = NestList[Rest, r, Length[r] - 1]
Out[5]= 
{{-1, 2, -3, 4, -5}, {2, -3, 4, -5}, {-3, 4, -5}, {4, -5}, {-5}}


In[6]:= MapThread[f, {r, rr}]
Out[6]=
{f[-1, {-1, 2, -3, 4, -5}], f[2, {2, -3, 4, -5}],
 f[-3, {-3, 4, -5}], f[4, {4, -5}], f[-5, {-5}]}

and each of these expressions must be threaded, as e.g.

In[7]:= Thread[f[1, {1, 2, 3, 4, 5}]]
Out[7]= {f[1, 1], f[1, 2], f[1, 3], f[1, 4], f[1, 5]}


This put together:

In[8]:= MapThread[Thread[f[#1, #2]] &, {r, rr}]
Out[8]=
{{f[-1, -1], f[-1, 2], f[-1, -3], f[-1, 4], f[-1, -5]},
 {f[2, 2], f[2, -3], f[2, 4], f[2, -5]},
 {f[-3, -3], f[-3, 4], f[-3, -5]}, 
 {f[4, 4], f[4, -5]}, {f[-5, -5]}}

What we did was not quite correct, all terms f[ j, j] are not wanted, more
so this:

In[9]:=
MapThread[Thread[f[#1, #2]] &, {Drop[r, {-1}], Drop[rr, {1}]}]
Out[9]=
{{f[-1, 2], f[-1, -3], f[-1, 4], f[-1, -5]},
{f[2, -3], f[2, 4], f[2, -5]}, {f[-3, 4], f[-3, -5]}, {f[4, -5]}}


What do we have to put in? Well, what I did, was Order, giving

In[10]:= Order[#, 2] & /@ {1, 2, 3}
Out[10]= {1, 0, -1}


In[11]:=
MapThread[Thread[f[#1, #2]] &, {Drop[r, {-1}], Drop[rr, {1}]}]
   /. f -> Order
Out[11]=
{{1, -1, 1, -1}, {-1, 1, -1}, {1, -1}, {-1}}

We just to sum up all that:

In[12]:= Plus @@ Flatten[%]
Out[12]= -2

Or perhaps better

In[13]:= Apply[Plus, %%, {0, 1}]
Out[13]= -2


Now we cannot just simply insert Order for f, as within Thread, e.g.

In[14]:= Order[-1, {2, -3, 4, -5}]
Out[14]= 1

becomes evaluated prematurely. So we have to prevent this with Unevaluated.

In[15]:= 
MapThread[
    Thread[Unevaluated[Order[#1, #2]]] &, {Drop[r, {-1}], Drop[rr, {1}]}] /.

  f -> Order
Out[15]=
{{1, -1, 1, -1}, {-1, 1, -1}, {1, -1}, {-1}}


Now as Order[j, j] == 0 we can dispense with the Drop-ping

In[16]:= 
MapThread[
  Thread[Unevaluated[Order[#1, #2]]] &, {r, NestList[Rest, r, Length[r] -
1]}]
Out[16]=
{{0, 1, -1, 1, -1}, {0, -1, 1, -1}, {0, 1, -1}, {0, -1}, {0}}

...and such arrive at

In[17]:=
Plus @@ Flatten[
      MapThread[
        Thread[Unevaluated[Order[#1, #2]]] &, {test, 
          NestList[Rest, test, Length[test] - 1]}]] // Timing
Out[17]= {3.095 Second, 12110}

This certainly is not bad.


But, what I had missed here, and Bobby Treat did not, is to search for a
more specialized function for Order, when the arguments are Reals (or
Integers), namely for

In[18]:= Sign[2 - #1] & /@ {1, 2, 3}
Out[18]= {1, 0, -1}

In[19]:=
Plus @@ Flatten[
      MapThread[
        Thread[Unevaluated[Sign[#2 - #1]]] &, {test, 
          NestList[Rest, test, Length[test] - 1]}]] // Timing
Out[19]= {1.141 Second, 12110}

What a difference! 


But for Sign, we have

In[20]:= Attributes[Sign]
Out[20]= {Listable, NumericFunction, Protected}

Such we don't nead threading

In[21]:=
MapThread[Sign[#2 - #1] &, {r, NestList[Rest, r, Length[r] - 1]}]
Out[21]=
{{0, 1, -1, 1, -1}, {0, -1, 1, -1}, {0, 1, -1}, {0, -1}, {0}}

and again threading with MapThread isn't needed

In[22]:=
Sign[#2 - #1] & @@ {r, NestList[Rest, r, Length[r] - 1]}
Out[22]=
{{0, 1, -1, 1, -1}, {0, -1, 1, -1}, {0, 1, -1}, {0, -1}, {0}}

In[23]:=
Plus @@ Flatten@Sign[#2 - #1] & @@ {test, 
      NestList[Rest, test, Length[test] - 1]} // Timing
Out[23]= {1.202 Second, 12110}


Staring steadily at this expression, we might see, that instead of using the
outer Thread (or MapThread), we just might map over the second argument, and
retrieve the former first argument to Sign, by taking just the first element
of that.

In[24]:=
Sign[#1 - First[#1]] & /@ NestList[Rest, r, Length[r] - 1]
Out[24]=
{{0, 1, -1, 1, -1}, {0, -1, 1, -1}, {0, 1, -1}, {0, -1}, {0}}

This is preferable as the input structure is simpler, one argument less and
map over one list, instead of threading over two lists (at the outer level).

Also, as stated above, Flatten isn't needed if we apply Plus to both levels:

In[25]:=
Apply[Plus, 
    Sign[# - First[#]] & /@ NestList[Rest, test, Length[test] - 1], {0, 
      1}] // Timing
Out[25]= {0.861 Second, 12110}

This already is better as best compiled code (and becomes even better as
problem size increases).


Now there comes something special, which we simply have to know (and this
has been reported in this list): Trace on large (linear) lists is much more
effective than Apply-ing Plus, see:

In[26]:= test2 = Table[Random[], {500000}];

In[27]:= Plus @@ test2 // Timing
Out[27]= {1.342 Second, 250348.}

In[28]:= Tr[test2] // Timing
Out[28]= {0.02 Second, 250348.}

In[29]:= Clear[test2]


We are eager to exploit this

In[30]:=
Tr[Tr[Sign[# - First[#]]] & /@ 
      NestList[Rest, test, Length[test] - 1]] // Timing
Out[30]= {0.311 Second, 12110}

In[31]:= carl[test] // Timing
Out[31]= {0.33 Second, 12110}

Bingo!! We got him, faster!


Deplorably....

In[44]:= test3 = Table[Random[], {10000}];
In[46]:= Tr[Tr[Sign[# - First[#]]] & /@ 
      NestList[Rest, test3, Length[test3] - 1]] // Timing
>From In[46]:=

No more memory available.
Mathematica kernel has shut down.
Try quitting other applications and then retry.



Carl's program is much more effcient on memory than ours!

What can we do better? If we only could put somehow the Sign function into
the nesting operation, we would not need NestList to map over, but only
Nest.

As we finally just sum over the result of map, we could instead already sum
up when nesting. (This is a "design pattern".) So what all we have to do, is
to drag with the partial sums.

A way to do this, is to reserve an element of the nested structure to
accumulate, as in:

In[32]:=
 First[ Nest[{#[[1]] + Tr[Sign[#[[2]] - First[#[[2]]]]], 
                  Rest[#[[2]]]} &,
             {0, test}, 
             Length[test] ]] // Timing

Out[32]= {0.321 Second, 12110}

In[33]:= carl[test] // Timing
Out[33]= {0.321 Second, 12110}


Now Carl didn't do this, but instead used a local symbol to accumulate

In[34]:=
Block[{c = 0}, 
    Nest[(c += Tr[Sign[# - First[#]]]; Rest[#]) &, test, Length[test] ]; 
    c] // Timing
Out[34]= {0.321 Second, 12110}

This is Carl's code. 
Finis


--
Hartmut Wolf



  • Prev by Date: Wolfram Research Releases Control System Professional Suite
  • Next by Date: Strange behavior of Simplify
  • Previous by thread: Wolfram Research Releases Control System Professional Suite
  • Next by thread: Strange behavior of Simplify