Re: Performance Improvement - Need help
- To: mathgroup at smc.vnet.net
- Subject: [mg62285] Re: [mg62256] Performance Improvement - Need help
- From: "Carl K. Woll" <carlw at wolfram.com>
- Date: Sat, 19 Nov 2005 05:54:08 -0500 (EST)
- References: <200511172203.RAA16316@smc.vnet.net>
- Sender: owner-wri-mathgroup at wolfram.com
Lee Newman wrote:
> Dear Group,
>
> I am working with computational model that has a main loop which
> executes about 10^6 to 10^7 times over the course of a simulation --
> taking about 30 hrs. The bottleneck function (below) includes and outer
> product and some matrix algebra. I have optimized it to the best of my
> knowledge, but would desperately like to know if any further optimization
> might be possible (including calling external functions in C or other
> language). Any suggestions would be greatly appreciated.
>
> FUNCTION ---------------------------------------------------------
>
> UpdateSynapses = Compile[{{matrix, _Real, 2}, {vector1, _Real, 1},
> {vector2,_Real, 1}, {thresh1, _Real}, {thresh2, _Real}, {C1, _Real},
> {C2, _Real}, {maxval, _Real}},
>
> Module[{coactivation},
>
> coactivation = Outer[Times,
> FloorZero[vector2-thresh2], FloorZero[vector1- thresh1]];
>
> C2* maxval*coactivation + (1 - C2* coactivation - C1)*matrix
^
|
should be . not *, I think -----------------------------+
>
> ] (* end module *)
>
> , {{FloorZero[__], _Real, 1}} ];
>
> Notes:
> (1) vector1 is 1x100; vector2 is 1x1500; matrix is 100x100; matrix2 is
> 100x1500; all vectors/matrices are comprised of reals (range 0 to 1)
> and are packed.
> (2) FloorZero=Compile[{{list, _Real, 1}}, UnitStep[list] * list].
> Eliminating this
> function does not significantly affect performance.
> (2) run time ~ 30hrs for 10^7 iterations (Pentium 4, 2.8GHz, 1GB RAM)
>
> Regards,
> Lee Newman
Lee,
Some comments.
1. Use Clip[vector-thresh,{0,10}] instead of FloorZero. It's a bit
faster, and a bit clearer to me at least.
2. Your coactivation matrix can be thought of as the dot product of a
column vector and a row vector. In this light, the dot product of
coactivation.matrix can be thought of as
c . (r . matrix)
instead of
(c . r) . matrix
Now, the dot product of a vector with a matrix is usually much faster
than the dot product of a matrix with a matrix, so this ought to provide
some speed gain.
3. The only thing left to worry about is the 1-C1 part of the matrix
product (1-C1-C2 coactivation).matrix. Since coactivation is a 1500x100
matrix, 1-C1 is really a 1500x100 matrix where all entries are 1-C1. It
turns out that the (1-C1).matrix part is really just 1500 copies of
Total[m].
4. We end up with the outer product of a 1500 element column vector with
a 100 element row vector, and then to each row we add the same 100
element row vector. It turns out that instead of Outer, it's a bit
faster to use Map.
Putting the above ideas together, I came up with the following
uncompiled function:
update[m_, v1_, v2_, t1_, t2_, c1_, c2_, max_] :=
Module[{f1, f2, i1, i2},
f1 = c2 Clip[v1 - t1, {0, 10}];
f2 = Clip[v2 - t2, {0, 10}];
i1 = max f1 - f1.m;
i2 = (1 - c1)Total[m];
(i1# + i2 &) /@ f2]
Here is some test data:
SeedRandom[1];
m = Table[Random[], {100}, {100}];
v1 = Developer`ToPackedArray@Table[Random[], {100}];
v2 = Table[Random[], {1500}];
{t1, t2, c1, c2, max} = Table[Random[], {5}];
Let's make sure the matrices and vectors are packed:
In[9]:=
Developer`PackedArrayQ/@{m,v1,v2}
Out[9]=
{True, True, True}
Now, comparing update with UpdateSynapses:
In[10]:=
Do[r1=update[m,v1,v2,t1,t2,c1,c2,max],{100}]//Timing
Do[r2=UpdateSynapses[m,v1,v2,t1,t2,c1,c2,max],{100}]//Timing
r1==r2
Out[10]=
{1.516 Second, Null}
Out[11]=
{5.078 Second, Null}
Out[12]=
True
At least on my slow machine, update is more than 3 times faster. If you
experience the same speedup, then it should take less than 10 hours.
Carl Woll
PS. The version of UpdateSynapses I used is:
UpdateSynapses=Compile[{
{matrix,_Real,2},
{vector1,_Real,1},
{vector2,_Real,1},
{thresh1,_Real},
{thresh2,_Real},
{C1,_Real},
{C2,_Real},
{maxval,_Real}
},
Module[{coactivation},
coactivation=Outer[
Times,
FloorZero[vector2-thresh2],
FloorZero[vector1-thresh1]
];
C2*maxval*coactivation+(1-C2*coactivation-C1).matrix],
{{FloorZero[__],_Real,1}}];
In[2]:=
FloorZero=Compile[{{list,_Real,1}},UnitStep[list]*list];
- References:
- Performance Improvement - Need help
- From: Lee Newman <leenewm@umich.edu>
- Performance Improvement - Need help