Re: Performance Improvement - Need help

*To*: mathgroup at smc.vnet.net*Subject*: [mg62285] Re: [mg62256] Performance Improvement - Need help*From*: "Carl K. Woll" <carlw at wolfram.com>*Date*: Sat, 19 Nov 2005 05:54:08 -0500 (EST)*References*: <200511172203.RAA16316@smc.vnet.net>*Sender*: owner-wri-mathgroup at wolfram.com

Lee Newman wrote: > Dear Group, > > I am working with computational model that has a main loop which > executes about 10^6 to 10^7 times over the course of a simulation -- > taking about 30 hrs. The bottleneck function (below) includes and outer > product and some matrix algebra. I have optimized it to the best of my > knowledge, but would desperately like to know if any further optimization > might be possible (including calling external functions in C or other > language). Any suggestions would be greatly appreciated. > > FUNCTION --------------------------------------------------------- > > UpdateSynapses = Compile[{{matrix, _Real, 2}, {vector1, _Real, 1}, > {vector2,_Real, 1}, {thresh1, _Real}, {thresh2, _Real}, {C1, _Real}, > {C2, _Real}, {maxval, _Real}}, > > Module[{coactivation}, > > coactivation = Outer[Times, > FloorZero[vector2-thresh2], FloorZero[vector1- thresh1]]; > > C2* maxval*coactivation + (1 - C2* coactivation - C1)*matrix ^ | should be . not *, I think -----------------------------+ > > ] (* end module *) > > , {{FloorZero[__], _Real, 1}} ]; > > Notes: > (1) vector1 is 1x100; vector2 is 1x1500; matrix is 100x100; matrix2 is > 100x1500; all vectors/matrices are comprised of reals (range 0 to 1) > and are packed. > (2) FloorZero=Compile[{{list, _Real, 1}}, UnitStep[list] * list]. > Eliminating this > function does not significantly affect performance. > (2) run time ~ 30hrs for 10^7 iterations (Pentium 4, 2.8GHz, 1GB RAM) > > Regards, > Lee Newman Lee, Some comments. 1. Use Clip[vector-thresh,{0,10}] instead of FloorZero. It's a bit faster, and a bit clearer to me at least. 2. Your coactivation matrix can be thought of as the dot product of a column vector and a row vector. In this light, the dot product of coactivation.matrix can be thought of as c . (r . matrix) instead of (c . r) . matrix Now, the dot product of a vector with a matrix is usually much faster than the dot product of a matrix with a matrix, so this ought to provide some speed gain. 3. The only thing left to worry about is the 1-C1 part of the matrix product (1-C1-C2 coactivation).matrix. Since coactivation is a 1500x100 matrix, 1-C1 is really a 1500x100 matrix where all entries are 1-C1. It turns out that the (1-C1).matrix part is really just 1500 copies of Total[m]. 4. We end up with the outer product of a 1500 element column vector with a 100 element row vector, and then to each row we add the same 100 element row vector. It turns out that instead of Outer, it's a bit faster to use Map. Putting the above ideas together, I came up with the following uncompiled function: update[m_, v1_, v2_, t1_, t2_, c1_, c2_, max_] := Module[{f1, f2, i1, i2}, f1 = c2 Clip[v1 - t1, {0, 10}]; f2 = Clip[v2 - t2, {0, 10}]; i1 = max f1 - f1.m; i2 = (1 - c1)Total[m]; (i1# + i2 &) /@ f2] Here is some test data: SeedRandom[1]; m = Table[Random[], {100}, {100}]; v1 = Developer`ToPackedArray@Table[Random[], {100}]; v2 = Table[Random[], {1500}]; {t1, t2, c1, c2, max} = Table[Random[], {5}]; Let's make sure the matrices and vectors are packed: In[9]:= Developer`PackedArrayQ/@{m,v1,v2} Out[9]= {True, True, True} Now, comparing update with UpdateSynapses: In[10]:= Do[r1=update[m,v1,v2,t1,t2,c1,c2,max],{100}]//Timing Do[r2=UpdateSynapses[m,v1,v2,t1,t2,c1,c2,max],{100}]//Timing r1==r2 Out[10]= {1.516 Second, Null} Out[11]= {5.078 Second, Null} Out[12]= True At least on my slow machine, update is more than 3 times faster. If you experience the same speedup, then it should take less than 10 hours. Carl Woll PS. The version of UpdateSynapses I used is: UpdateSynapses=Compile[{ {matrix,_Real,2}, {vector1,_Real,1}, {vector2,_Real,1}, {thresh1,_Real}, {thresh2,_Real}, {C1,_Real}, {C2,_Real}, {maxval,_Real} }, Module[{coactivation}, coactivation=Outer[ Times, FloorZero[vector2-thresh2], FloorZero[vector1-thresh1] ]; C2*maxval*coactivation+(1-C2*coactivation-C1).matrix], {{FloorZero[__],_Real,1}}]; In[2]:= FloorZero=Compile[{{list,_Real,1}},UnitStep[list]*list];

**References**:**Performance Improvement - Need help***From:*Lee Newman <leenewm@umich.edu>