Re: RE: Increase in efficiency with Module
- To: mathgroup at smc.vnet.net
- Subject: [mg40096] Re: [mg40060] RE: [mg40050] Increase in efficiency with Module
- From: Dr Bob <drbob at bigfoot.com>
- Date: Thu, 20 Mar 2003 03:33:07 -0500 (EST)
- References: <8EB1C3634596D6118952006008F711CD5BCD2F@debis.com>
- Reply-to: drbob at bigfoot.com
- Sender: owner-wri-mathgroup at wolfram.com
On lists of size 1000, the three fastest are now ftauc3, ftauc3P (5% faster), and ftauc3PP (another 8% faster). Bobby On Wed, 19 Mar 2003 09:11:54 +0100, Wolf, Hartmut <Hartmut.Wolf@t- systems.com> wrote: > Bobby, > > yes, I've missed a minor point, which I did not correct, as this had not > been the focus of the discussion: > > In[1]:= ry = Table[Random[], {1000}]; > In[3]:= > ftaucSCP = Compile[{{ry, _Real, 1}}, Module[{a = 0, n = Length[ry]}, > Do[Do[If[ry[[i]] < ry[[j]], ++a, If[ry[[i]] > ry[[j]], --a]], {j, i + 1, > n}], {i, 1, n - 1}]; a]]; > In cases like this use PreDecrement and PreIncrement, instead of > Decrement > or Increment. This gives another performance boost of a percent or a > fraction. I'm not very experienced with the Mathematica Compiler (which, > BTW, generates special code to be interpreted, not C code), however my > observation is that code which is nearest to plain C-code compiles > fastest. > And the rule I put up above is an old rule for C programming. Nowadays, > however, a modern C-compiler recognizes the situation and optimizes the > generated machine code accordingly. And also, the extra machine cycle (a > register move) (if it were) needed for a++, compared to ++a gets > unnoticed > with a modern CPU (their problems are now quite different). But this does > not apply to mathematica compiled code, as it is interpreted, sic! > > > In[4]:= Do[ftauc3[ry], {10}] // Timing > Out[4]= {12.168 Second, Null} > In[5]:= Do[ftaucSCP[ry],{10}]//Timing > Out[5]= {11.616 Second, Null} > > > Applying this to your code: > > In[20]:= > ftauc3P = Compile[{{ry, _Real, 1}}, Block[{i = 0, j, a = 0, n = > Length@ry}, Do[j = ++i; > Do[a += Sign[ry[[++j]] - ry[[i]]], {n - i}], > {n - 1}]; > a]]; > > In[21]:= Do[ftauc3P[ry], {10}] // Timing > Out[21]= {11.837 Second, Null} > > > The warning as not to use external functions does not apply to your code, > as > Sign is one of those functions which are compiled inline. > > Other improvement (Do loops) of your code: > > In[30]:= > ftauc3PP = Compile[{{ry, _Real, 1}}, > Block[{a = 0, n = Length[ry]}, > Do[ > Do[a += Sign[ry[[j]] - ry[[i]]], {j, i + 1, n}], > {i, 1, n - 1}]; > a]]; > > > In[31]:= Do[ftauc3PP[ry],{10}]//Timing > Out[31]= {10.686 Second, Null} > > That seems to be it! > > > > Hartmut > > > >> -----Original Message----- >> From: Dr Bob [mailto:drbob at bigfoot.com] To: mathgroup at smc.vnet.net >> Sent: Tuesday, March 18, 2003 9:55 PM >> To: Wolf, Hartmut; mathgroup at smc.vnet.net; aehirsh at stanford.edu >> Subject: [mg40096] Re: [mg40060] RE: [mg40050] Increase in efficiency with Module >> >> >> I timed the two original codes, my slightly modified version of that >> (treat), Hartmut's two functional codes, and his fastest code, ftaucSC. >> There's a near tie between treat, ftaucSC, and the original ftauc (in >> that order). I didn't use Real random numbers, since ties seem to be a >> significant factor in the original code, but the winning functions are >> the same with Reals anyway. >> >> Some might find the following a useful guide to timing functions. I >> randomize the order in which functions are timed for each random >> dataset, output means and standard errors sorted from fastest to >> slowest, and show how to eliminate slow functions before going on to >> larger data sets and greater numbers of trials. >> >> <<Statistics`DescriptiveStatistics` >> allNames={"ftauc2","functional","functional2","ftauc","ftaucSC" >> ,"treat"}; >> randomTrial[functions_List,n_Integer]:=Module[{data=Table[Random[Integer,50 > ],{\ >> n}],order,results}, >> order=Ordering[Random[]&/@functions]; >> results=First@Timing[#@data]&/@functions[[order]]; >> results[[Ordering@order]]/Second >> ] >> randomTrials[names:{__String},size_Integer,trials_Integer]:= >> TableForm[ >> Transpose@Module[{functions=ToExpression/@names,times,ranking},times=\ >> Transpose@Table[randomTrial[functions,size],{trials}]; >> means=Mean/@times; >> ranking=Ordering@means; >> times=times[[ranking]]; >> {names[[ranking]],Mean/@times,StandardErrorOfSampleMean/@times} >> ],TableHeadings\[Rule]{None,{"Function","Mean","Std Error"}}] >> > [...] > >> treat=Compile[{{ry,_Real,1}},Block[{i=0,j,a=0,n=Length@ry}, >> Do[i++;j=i+1; >> Do[a+=Sign[ry[[j]]-ry[[i]]];j++,{n-i}],{n-1}];a]]; > [...] > >> >> randomTrials[fastest, 1000, 100] >> >> TableForm[ > {{"Function", "Mean", "Std Error"}, {"treat", 0.27246999999999955, > 0.0009006681021902752}, > {"ftaucSC", 0.29871000000000036, 0.0009034496179205452}, {"ftauc", > 0.3101900000000001, 0.0011866849675129872}}, >> TableHeadings -> {None, {"Function", "Mean", "Std Error"}}] >> >> Bobby >> >> On Tue, 18 Mar 2003 02:21:27 -0500 (EST), Wolf, Hartmut <Hartmut.Wolf@t- >> systems.com> wrote: >> >>> >>>> -----Original Message----- >>>> From: Aaron E. Hirsh [mailto:aehirsh at stanford.edu] To: mathgroup at smc.vnet.net >>> To: mathgroup at smc.vnet.net >>>> Sent: Monday, March 17, 2003 9:35 AM >>>> To: mathgroup at smc.vnet.net >>>> Subject: [mg40096] [mg40060] [mg40050] Increase in efficiency with Module >>>> >>>> >>>> I would be grateful if someone could explain the difference in >>>> efficiency between the following two simple programs. ry is a list of >>>> length n. For n = 1000, the program using Module takes 1 second on my >>>> laptop, whereas the program without Module takes 75 seconds. >>>> >>>> ftauc = Compile[{{ry, _Real, 1}, {n, _Real, 0}}, >>>> Module[{i, j, a}, i = 1; >>>> a = 0; >>>> Do[ j = i + 1; Do[ >>>> If[ry[[i]] < ry[[j]], a++, If[ry[[i]] > ry[[j]], a--]]; >>>> j++, {n - i}]; >>>> i++, {n - 1}]; a >>>> ]] >>>> >>>> ftauc2 = Compile[{{ry, _Real, 1}, {n, _Real, 0}}, >>>> i = 1; >>>> a = 0; >>>> Do[j = i + 1; >>>> Do[ >>>> If[ry[[i]] < ry[[j]], a++, If[ry[[i]] > ry[[j]], a--]]; >>>> j++, {n - i}]; >>>> i++, {n - 1}]; a >>>> ] >>>> >>>> thank you, >>>> -- >>>> >>>> Aaron E. Hirsh >>>> Center for Computational Genetics and Biological Modeling >>>> Department of Biological Sciences >>>> Stanford University >>>> tel. (650) 723-4952 >>>> fax. (650) 725-0180 >>>> >>> >>> Aaron, >>> >>> if you inspect the compiled code ftauc[[4]] and ftauc2[[4]] you see >>> considerable differences which are obviously caused by access to the > global >>> variables i, j and a (in ftauc). I cannot go through theses statements >>> line by line, merely refer to >>> http://library.wolfram.com/database/Conferences/4682/ >>> >>> Although, this appears to be out of date now, a key sentence therein > still >>> holds: "The main obstacle to achieving a speedup ... is when the >>> compiled >>> code has to externally evaluate a function". From the compiled code for >>> ftauc2, I suppose, that access and updates of the global variables >>> i,j,a is done by calling an (external) function. >>> >>> Further: "In structures such as Module or Block, local variables can be >>> declared. The compiled code will store value of such a variable in a >>> register, which can then be updated whenever the variable gets a new >>> value. >>> >>> >>> Such use local variables! >>> >>> > [...] >>> -- >>> Hartmut Wolf >>> > [...] >> -- majort at cox-internet.com >> Bobby R. Treat >> > > -- majort at cox-internet.com Bobby R. Treat