RE: Increase in efficiency with Module

*To*: mathgroup at smc.vnet.net*Subject*: [mg40060] RE: [mg40050] Increase in efficiency with Module*From*: "Wolf, Hartmut" <Hartmut.Wolf at t-systems.com>*Date*: Tue, 18 Mar 2003 02:21:27 -0500 (EST)*Sender*: owner-wri-mathgroup at wolfram.com

>-----Original Message----- >From: Aaron E. Hirsh [mailto:aehirsh at stanford.edu] To: mathgroup at smc.vnet.net >Sent: Monday, March 17, 2003 9:35 AM >To: mathgroup at smc.vnet.net >Subject: [mg40060] [mg40050] Increase in efficiency with Module > > >I would be grateful if someone could explain the difference in >efficiency between the following two simple programs. ry is a list of >length n. For n = 1000, the program using Module takes 1 second on my >laptop, whereas the program without Module takes 75 seconds. > >ftauc = Compile[{{ry, _Real, 1}, {n, _Real, 0}}, > > Module[{i, j, a}, > i = 1; > a = 0; > Do[ j = i + 1; > Do[ > If[ry[[i]] < ry[[j]], a++, > If[ry[[i]] > ry[[j]], a--]]; > j++, {n - i}]; > i++, {n - 1}]; a > ]] > >ftauc2 = Compile[{{ry, _Real, 1}, {n, _Real, 0}}, > i = 1; > a = 0; > Do[j = i + 1; > Do[ > If[ry[[i]] < ry[[j]], a++, If[ry[[i]] > ry[[j]], a--]]; > j++, {n - i}]; > i++, {n - 1}]; a > ] > >thank you, >-- > >Aaron E. Hirsh >Center for Computational Genetics and Biological Modeling >Department of Biological Sciences >Stanford University >tel. (650) 723-4952 >fax. (650) 725-0180 > Aaron, if you inspect the compiled code ftauc[[4]] and ftauc2[[4]] you see considerable differences which are obviously caused by access to the global variables i, j and a (in ftauc). I cannot go through theses statements line by line, merely refer to http://library.wolfram.com/database/Conferences/4682/ Athough, this appears to be out of date now, a key sentence therein still holds: "The main obstacle to achieving a speedup ... is when the compiled code has to externally evaluate a function". From the compiled code for ftauc2, I suppose, that access and updates of the global variables i,j,a is done by calling an (external) function. Further: "In structures such as Module or Block, local variables can be declared. The compiled code will store value of such a variable in a register, which can then be updated whenever the variable gets a new value. Such use local variables! But let me report some further observations: First my timings were: In[3]:= ry = Table[Random[], {333}]; In[4]:= ftauc2[ry, Length[ry]] // Timing Out[4]= {12.618 Second, -846} In[5]:= ftauc[ry, Length[ry]] // Timing Out[5]= {0.16 Second, -846} Performance of the uncompiled code: In[6]:= ftaucU[ry_, n_] := Module[{i, j, a}, i = 1; a = 0; Do[j = i + 1; Do[If[ry[[i]] < ry[[j]], a++, If[ry[[i]] > ry[[j]], a--]]; j++, {n - i}]; i++, {n - 1}]; a] In[7]:= ftauc2U[ry_, n_] := (i = 1; a = 0; Do[j = i + 1; Do[If[ry[[i]] < ry[[j]], a++, If[ry[[i]] > ry[[j]], a--]]; j++, {n - i}]; i++, {n - 1}]; a) In[8]:= ftauc2U[ry, Length[ry]] // Timing Out[8]= {3.895 Second, -846} In[9]:= ftaucU[ry, Length[ry]] // Timing Out[9]= {3.846 Second, -846} Such the uncompiled code for ftauc2 is faster than the compiled one! And ftaucU (ftauc uncompiled) is a little bit faster than ftauc2U. We might expect this. The Do-loops may be improved a bit: In[19]:= ftaucUS[ry_] := Module[{a = 0, n = Length[ry]}, Do[Do[If[ry[[i]] < ry[[j]], a++, If[ry[[i]] > ry[[j]], a--]], {j, i + 1, n}], {i, 1, n - 1}]; a] In[20]:= ftaucUS[ry] // Timing Out[20]= {3.345 Second, -846} Still better code from using Sum: In[33]:= ftaucUSU[ry_] := Module[{n = Length[ry]}, Sum[Sum[If[ry[[i]] < ry[[j]], 1, If[ry[[i]] > ry[[j]], -1]], {j, i + 1, n}], {i, 1, n - 1}]] In[34]:= ftaucUSU[ry] // Timing Out[34]= {2.543 Second, -846} Functional code is better though: In[23]:= -Plus @@ Flatten[ MapIndexed[Take[#1, First[#2]] &, Outer[Order, ry, ry]]] // Timing Out[23]= {0.601 Second, -846} In[24]:= Plus @@ Flatten[ MapThread[ Thread[Unevaluated[Order[#1, #2]]] &, {ry, NestList[Rest, ry, Length[ry] - 1]}]] // Timing Out[24]= {0.33 Second, -846} If we compile the Loops with Order (which was better when not compiled) the compiled code is slower (due to the function call to Order): In[25]:= ftaucUSSc1 = Compile[{{ry, _Real, 1}}, Module[{a = 0, n = Length[ry]}, Do[Do[a += Order[ry[[i]], ry[[j]]], {j, i + 1, n}], {i, 1, n - 1}]; a], {{Order[_, _], _Integer}}] In[26]:= Timing[ftaucUSSc1[ry]] Out[26]= {0.461 Second, -846} In[30]:= ftaucSC = Compile[{{ry, _Real, 1}}, Module[{a = 0, n = Length[ry]}, Do[Do[If[ry[[i]] < ry[[j]], a++, If[ry[[i]] > ry[[j]], a--]], {j, i + 1, n}], {i, 1, n - 1}]; a]] In[31]:= Timing[ftaucSC[ry]] Out[31]= {0.13 Second, -846} Compared to your ftauc In[32]:= Timing[ftauc[ry, Length[ry]]] Out[32]= {0.16 Second, -846} corresponding to that small advantage of the uncompiled code. -- Hartmut Wolf

**Follow-Ups**:**Re: RE: Increase in efficiency with Module***From:*Dr Bob <drbob@bigfoot.com>