RE: RE: Increase in efficiency with Module

*To*: mathgroup at smc.vnet.net*Subject*: [mg40090] RE: [mg40060] RE: [mg40050] Increase in efficiency with Module*From*: "Wolf, Hartmut" <Hartmut.Wolf at t-systems.com>*Date*: Thu, 20 Mar 2003 03:32:38 -0500 (EST)*Sender*: owner-wri-mathgroup at wolfram.com

Bobby, yes, I've missed a minor point, which I did not correct, as this had not been the focus of the discussion: In[1]:= ry = Table[Random[], {1000}]; In[3]:= ftaucSCP = Compile[{{ry, _Real, 1}}, Module[{a = 0, n = Length[ry]}, Do[Do[If[ry[[i]] < ry[[j]], ++a, If[ry[[i]] > ry[[j]], --a]], {j, i + 1, n}], {i, 1, n - 1}]; a]]; In cases like this use PreDecrement and PreIncrement, instead of Decrement or Increment. This gives another performance boost of a percent or a fraction. I'm not very experienced with the Mathematica Compiler (which, BTW, generates special code to be interpreted, not C code), however my observation is that code which is nearest to plain C-code compiles fastest. And the rule I put up above is an old rule for C programming. Nowadays, however, a modern C-compiler recognizes the situation and optimizes the generated machine code accordingly. And also, the extra machine cycle (a register move) (if it were) needed for a++, compared to ++a gets unnoticed with a modern CPU (their problems are now quite different). But this does not apply to mathematica compiled code, as it is interpreted, sic! In[4]:= Do[ftauc3[ry], {10}] // Timing Out[4]= {12.168 Second, Null} In[5]:= Do[ftaucSCP[ry],{10}]//Timing Out[5]= {11.616 Second, Null} Applying this to your code: In[20]:= ftauc3P = Compile[{{ry, _Real, 1}}, Block[{i = 0, j, a = 0, n = Length@ry}, Do[j = ++i; Do[a += Sign[ry[[++j]] - ry[[i]]], {n - i}], {n - 1}]; a]]; In[21]:= Do[ftauc3P[ry], {10}] // Timing Out[21]= {11.837 Second, Null} The warning as not to use external functions does not apply to your code, as Sign is one of those functions which are compiled inline. Other improvement (Do loops) of your code: In[30]:= ftauc3PP = Compile[{{ry, _Real, 1}}, Block[{a = 0, n = Length[ry]}, Do[ Do[a += Sign[ry[[j]] - ry[[i]]], {j, i + 1, n}], {i, 1, n - 1}]; a]]; In[31]:= Do[ftauc3PP[ry],{10}]//Timing Out[31]= {10.686 Second, Null} That seems to be it! Hartmut >-----Original Message----- >From: Dr Bob [mailto:drbob at bigfoot.com] To: mathgroup at smc.vnet.net >Sent: Tuesday, March 18, 2003 9:55 PM >To: Wolf, Hartmut; mathgroup at smc.vnet.net; aehirsh at stanford.edu >Subject: [mg40090] Re: [mg40060] RE: [mg40050] Increase in efficiency with Module > > >I timed the two original codes, my slightly modified version of that >(treat), Hartmut's two functional codes, and his fastest code, ftaucSC. >There's a near tie between treat, ftaucSC, and the original >ftauc (in that >order). I didn't use Real random numbers, since ties seem to be a >significant factor in the original code, but the winning >functions are the >same with Reals anyway. > >Some might find the following a useful guide to timing functions. I >randomize the order in which functions are timed for each >random dataset, >output means and standard errors sorted from fastest to >slowest, and show >how to eliminate slow functions before going on to larger data >sets and >greater numbers of trials. > ><<Statistics`DescriptiveStatistics` >allNames={"ftauc2","functional","functional2","ftauc","ftaucSC" >,"treat"}; >randomTrial[functions_List,n_Integer]:=Module[{data=Table[Random[Integer,50 ],{\ >n}],order,results}, > order=Ordering[Random[]&/@functions]; > results=First@Timing[#@data]&/@functions[[order]]; > results[[Ordering@order]]/Second > ] >randomTrials[names:{__String},size_Integer,trials_Integer]:= > TableForm[ > >Transpose@Module[{functions=ToExpression/@names,times,ranking},times=\ >Transpose@Table[randomTrial[functions,size],{trials}]; > means=Mean/@times; > ranking=Ordering@means; > times=times[[ranking]]; > {names[[ranking]],Mean/@times,StandardErrorOfSampleMean/@times} > ],TableHeadings\[Rule]{None,{"Function","Mean","Std Error"}}] > [...] >treat=Compile[{{ry,_Real,1}},Block[{i=0,j,a=0,n=Length@ry}, > Do[i++;j=i+1; > Do[a+=Sign[ry[[j]]-ry[[i]]];j++,{n-i}],{n-1}];a]]; [...] > >randomTrials[fastest, 1000, 100] > >TableForm[ {{"Function", "Mean", "Std Error"}, {"treat", 0.27246999999999955, 0.0009006681021902752}, {"ftaucSC", 0.29871000000000036, 0.0009034496179205452}, {"ftauc", 0.3101900000000001, 0.0011866849675129872}}, >TableHeadings -> {None, > {"Function", "Mean", "Std Error"}}] > >Bobby > >On Tue, 18 Mar 2003 02:21:27 -0500 (EST), Wolf, Hartmut ><Hartmut.Wolf at t-systems.com> wrote: > >> >>> -----Original Message----- >>> From: Aaron E. Hirsh [mailto:aehirsh at stanford.edu] To: mathgroup at smc.vnet.net >> To: mathgroup at smc.vnet.net >>> Sent: Monday, March 17, 2003 9:35 AM >>> To: mathgroup at smc.vnet.net >>> Subject: [mg40090] [mg40060] [mg40050] Increase in efficiency with Module >>> >>> >>> I would be grateful if someone could explain the difference in >>> efficiency between the following two simple programs. ry is a list of >>> length n. For n = 1000, the program using Module takes 1 second on my >>> laptop, whereas the program without Module takes 75 seconds. >>> >>> ftauc = Compile[{{ry, _Real, 1}, {n, _Real, 0}}, >>> Module[{i, j, a}, i = 1; >>> a = 0; >>> Do[ j = i + 1; Do[ >>> If[ry[[i]] < ry[[j]], a++, If[ry[[i]] > ry[[j]], a--]]; >>> j++, {n - i}]; >>> i++, {n - 1}]; a >>> ]] >>> >>> ftauc2 = Compile[{{ry, _Real, 1}, {n, _Real, 0}}, >>> i = 1; >>> a = 0; >>> Do[j = i + 1; >>> Do[ >>> If[ry[[i]] < ry[[j]], a++, If[ry[[i]] > ry[[j]], a--]]; >>> j++, {n - i}]; >>> i++, {n - 1}]; a >>> ] >>> >>> thank you, >>> -- >>> >>> Aaron E. Hirsh >>> Center for Computational Genetics and Biological Modeling >>> Department of Biological Sciences >>> Stanford University >>> tel. (650) 723-4952 >>> fax. (650) 725-0180 >>> >> >> Aaron, >> >> if you inspect the compiled code ftauc[[4]] and ftauc2[[4]] you see >> considerable differences which are obviously caused by access to the global >> variables i, j and a (in ftauc). I cannot go through theses statements >> line by line, merely refer to >> http://library.wolfram.com/database/Conferences/4682/ >> >> Although, this appears to be out of date now, a key sentence therein still >> holds: "The main obstacle to achieving a speedup ... is when the compiled >> code has to externally evaluate a function". From the compiled code for >> ftauc2, I suppose, that access and updates of the global variables i,j,a >> is done by calling an (external) function. >> >> Further: "In structures such as Module or Block, local variables can be >> declared. The compiled code will store value of such a variable in a >> register, which can then be updated whenever the variable gets a new >> value. >> >> >> Such use local variables! >> >> [...] >> -- >> Hartmut Wolf >> [...] >-- >majort at cox-internet.com >Bobby R. Treat >