MathGroup Archive: March 2003 [00400]

[Date Index] [Thread Index] [Author Index]

RE: Increase in efficiency with Module

To: mathgroup at smc.vnet.net
Subject: [mg40060] RE: [mg40050] Increase in efficiency with Module
From: "Wolf, Hartmut" <Hartmut.Wolf at t-systems.com>
Date: Tue, 18 Mar 2003 02:21:27 -0500 (EST)
Sender: owner-wri-mathgroup at wolfram.com

>-----Original Message-----
>From: Aaron E. Hirsh [mailto:aehirsh at stanford.edu]
To: mathgroup at smc.vnet.net
>Sent: Monday, March 17, 2003 9:35 AM
>To: mathgroup at smc.vnet.net
>Subject: [mg40060] [mg40050] Increase in efficiency with Module
>
>
>I would be grateful if someone could explain the difference in 
>efficiency between the following two simple programs. ry is a list of 
>length n. For n = 1000, the program using Module takes 1 second on my 
>laptop, whereas the program without Module takes 75 seconds.
>
>ftauc = Compile[{{ry, _Real, 1}, {n, _Real, 0}},
>    
>    Module[{i, j, a},     
>      i = 1;
>      a = 0;
>      Do[ j = i + 1;        
>        Do[
>          If[ry[[i]] < ry[[j]], a++, 
>             If[ry[[i]] > ry[[j]], a--]];
>          j++, {n - i}];
>        i++, {n - 1}]; a
>      ]]
>
>ftauc2 = Compile[{{ry, _Real, 1}, {n, _Real, 0}},
>    i = 1;
>    a = 0;
>    Do[j = i + 1;
>      Do[
>        If[ry[[i]] < ry[[j]], a++, If[ry[[i]] > ry[[j]], a--]];
>        j++, {n - i}];
>      i++, {n - 1}]; a
>    ]
>
>thank you,
>-- 
>
>Aaron E. Hirsh
>Center for Computational Genetics and Biological Modeling
>Department of Biological Sciences
>Stanford University
>tel. (650) 723-4952
>fax. (650) 725-0180
>

Aaron,

if you inspect the compiled code ftauc[[4]] and ftauc2[[4]] you see
considerable differences which are obviously caused by access to the global
variables i, j and a (in ftauc). I cannot go through theses statements line
by line, merely refer to 
http://library.wolfram.com/database/Conferences/4682/ 

Athough, this appears to be out of date now, a key sentence therein still
holds: "The main obstacle to achieving a speedup ... is when the compiled
code has to externally evaluate a function". From the compiled code for
ftauc2, I suppose, that access and updates of the global variables i,j,a is
done by calling an (external) function. 

Further: "In structures such as Module or Block, local variables can be
declared. The compiled code will store value of such a variable in a
register, which can then be updated whenever the variable gets a new value.


Such use local variables!


But let me report some further observations:

First my timings were:

In[3]:= ry = Table[Random[], {333}];

In[4]:= ftauc2[ry, Length[ry]] // Timing
Out[4]= {12.618 Second, -846}

In[5]:= ftauc[ry, Length[ry]] // Timing
Out[5]= {0.16 Second, -846}



Performance of the uncompiled code:

In[6]:=
ftaucU[ry_, n_] := Module[{i, j, a}, i = 1;
    a = 0;
    Do[j = i + 1;
      Do[If[ry[[i]] < ry[[j]], a++, If[ry[[i]] > ry[[j]], a--]];
        j++, {n - i}];
      i++, {n - 1}]; a]

In[7]:=
ftauc2U[ry_, n_] := (i = 1;
    a = 0;
    Do[j = i + 1;
      Do[If[ry[[i]] < ry[[j]], a++, If[ry[[i]] > ry[[j]], a--]];
        j++, {n - i}];
      i++, {n - 1}]; a)

In[8]:= ftauc2U[ry, Length[ry]] // Timing
Out[8]= {3.895 Second, -846}

In[9]:= ftaucU[ry, Length[ry]] // Timing
Out[9]= {3.846 Second, -846}

Such the uncompiled code for ftauc2 is faster than the compiled one! And
ftaucU (ftauc uncompiled) is a little bit faster than ftauc2U. We might
expect this.



The Do-loops may be improved a bit:

In[19]:=
ftaucUS[ry_] := Module[{a = 0, n = Length[ry]}, 
   Do[Do[If[ry[[i]] < ry[[j]], a++, 
       If[ry[[i]] > ry[[j]], a--]], {j, i + 1, n}], 
     {i, 1, n - 1}]; a]

In[20]:= ftaucUS[ry] // Timing
Out[20]= {3.345 Second, -846}


Still better code from using Sum:

In[33]:=
ftaucUSU[ry_] := Module[{n = Length[ry]}, 
   Sum[Sum[If[ry[[i]] < ry[[j]], 1, 
      If[ry[[i]] > ry[[j]], -1]], {j, i + 1, n}], 
    {i, 1, n - 1}]]

In[34]:= ftaucUSU[ry] // Timing
Out[34]= {2.543 Second, -846}


Functional code is better though:

In[23]:= 
-Plus @@ Flatten[
        MapIndexed[Take[#1, First[#2]] &, Outer[Order, ry, ry]]] // Timing

Out[23]= {0.601 Second, -846}



In[24]:= 
Plus @@ Flatten[
      MapThread[
        Thread[Unevaluated[Order[#1, #2]]] &, {ry, 
          NestList[Rest, ry, Length[ry] - 1]}]] // Timing

Out[24]= {0.33 Second, -846}



If we compile the Loops with Order (which was better when not compiled) the
compiled code is slower (due to the function call to Order):

In[25]:=
ftaucUSSc1 = Compile[{{ry, _Real, 1}}, 
   Module[{a = 0, n = Length[ry]}, 
    Do[Do[a += Order[ry[[i]], ry[[j]]], 
       {j, i + 1, n}], {i, 1, n - 1}]; a], 
   {{Order[_, _], _Integer}}]

In[26]:= Timing[ftaucUSSc1[ry]]
Out[26]= {0.461 Second, -846}


In[30]:=
ftaucSC = Compile[{{ry, _Real, 1}}, 
   Module[{a = 0, n = Length[ry]}, 
    Do[Do[If[ry[[i]] < ry[[j]], a++, 
        If[ry[[i]] > ry[[j]], a--]], {j, i + 1, n}], 
      {i, 1, n - 1}]; a]]

In[31]:= Timing[ftaucSC[ry]]
Out[31]= {0.13 Second, -846}


Compared to your ftauc

In[32]:= Timing[ftauc[ry, Length[ry]]]
Out[32]= {0.16 Second, -846}

corresponding to that small advantage of the uncompiled code.


--
Hartmut Wolf

Follow-Ups:
- Re: RE: Increase in efficiency with Module
  - From: Dr Bob <drbob@bigfoot.com>

Prev by Date: Re: Re: discretization and plotting pde system

Next by Date: Re: Q: lists

Previous by thread: Re: Increase in efficiency with Module

Next by thread: Re: RE: Increase in efficiency with Module