[Date Index]
[Thread Index]
[Author Index]
Re: Re: aggregation of related elements in a list
*To*: mathgroup at smc.vnet.net
*Subject*: [mg61892] Re: Re: aggregation of related elements in a list
*From*: Maxim <ab_def at prontomail.com>
*Date*: Thu, 3 Nov 2005 04:59:13 -0500 (EST)
*References*: <djcgcs$6du$1@smc.vnet.net> <djfnjr$b1i$1@smc.vnet.net> <200510270902.FAA19490@smc.vnet.net> <200510280725.DAA08769@smc.vnet.net> <200510300443.AAA09785@smc.vnet.net> <dka0du$70e$1@smc.vnet.net>
*Sender*: owner-wri-mathgroup at wolfram.com
On Wed, 2 Nov 2005 09:21:02 +0000 (UTC), Carl K. Woll <carl at woll2woll.com>
wrote:
>
> Daniel Lichtblau's algorithm is very nice, and the scaling is close to
> optimal. However, I still believe that a SparseArray based approach
> might be quicker. The algorithms used to manipulate SparseArray objects
> are highly optimized, so tapping into these extremely quick algorithms
> can be very useful.
>
> At any rate, I revised my old algorithm as follows:
>
> 1. Construct a transition matrix t.
> 2. While the matrix t is not empty:
> a. Purge any components which already have more than 15 elements. That
> is, if any row has more than 15 elements, find all elements related to
> this element by finding a fixed point using matrix vector dot products.
> b. Square t
> c. Purge any components which are complete. That is, if any rows were
> unchanged after squaring, delete them.
> d. Repeat
>
> Here is the code:
>
> agg[n_, pairs_] :=
> Module[{sp, t, rowcounts, oldrowcounts, component, complete, tmp},
> sp = SparseArray[Thread[pairs -> 1], {n, n}];
> t = Sign[sp + SparseArray[{i_,i_}->1, {n, n}] + Transpose[sp]];
> rowcounts = countrows[t];
> Reap[
> While[
> Total[rowcounts] > 0 (*not empty*),
> While[Max[rowcounts] > 15 (*purge long components*),
> component = FixedPoint[
> Sign[t . #1] & ,
> t[[Ordering[rowcounts, -1][[1]]]]
> ];
> Sow[nonzeros[component]];
> t = sparsediagonal[1 - component] . t;
> rowcounts = countrows[t]
> ];
> oldrowcounts = rowcounts;
> t = Sign[t . t]; (* square *)
> rowcounts = countrows[t];
> complete = Sign[rowcounts] + Sign[oldrowcounts - rowcounts];
> If[Total[complete] > 0, (* purge complete rows *)
> tmp = (t . sparsediagonal[Range[n]])[[nonzeros[complete]]];
> Sow /@ Union[
> List @@ tmp /. SparseArray[_, _, _, {__, x_}] :> x
> ];
> t = sparsediagonal[1 - Sign[SparseArray[complete] . t]] . t;
> rowcounts = countrows[t];
> ]
> ]
> ][[2, 1]]
> ]
>
> countrows[s_] := s /. SparseArray[_, _, _, {_, {x_, _}, _}] :>
> Differences[x]
>
> sparsediagonal[v_] := SparseArray[Table[{i, i}, {i, Length[v]}] ->
> Normal[v]]
>
> nonzeros[a_] :=
> SparseArray[a] /. SparseArray[_, _, _, {_, {_, x_}, _}] :> Flatten[x]
>
> The helper function countrows counts the number of nonzero elements in
> each row of a sparse matrix, sparsediagonal converts a vector into a
> sparse matrix with the vector as diagonal elements, and nonzeros returns
> the positions of the nonzero elements in a vector. As an example where
> tapping into the sparse algorithms is helpful consider the following
> timings:
>
> v = Table[Random[Integer], {10^6}];
>
> In[53]:=
> Timing[r1 = Flatten[Position[v, Except[0], 1, Heads -> False]]; ]
> Timing[r2 = nonzeros[v]; ]
> r1 === r2
>
> Out[53]=
> {1.609 Second,Null}
>
> Out[54]=
> {0.047 Second,Null}
>
> Out[55]=
> True
>
> We see that nonzeros is over 30 times faster than using Position.
>
> Returning to the connected components problem, here is a test set, and
> comparisons of agg with agg3 (maxim) and aggregate (dan lichtblau):
>
> In[56]:=
> SeedRandom[1];
> n =40000;
> pairs = Table[RandomInteger[{1,n}],{n},{2}];
>
> In[59]:=
> r1=agg[n,pairs];//Timing
> r2=agg3[n,pairs];//Timing
> r3=aggregate[n,pairs];//Timing
> Sort[Sort/@r1]===Sort[Sort/@r2]===Sort[Sort/@r3]
>
> Out[59]=
> {1.219 Second,Null}
>
> Out[60]=
> {2.469 Second,Null}
>
> Out[61]=
> {2.25 Second,Null}
>
> Out[62]=
> True
>
> So, agg appears to be about twice as fast for this sort of data set.
>
> Carl Woll
> Wolfram Research
>
I'm guessing that Differences was defined as
Differences[x_] := ListCorrelate[{-1, 1}, x]
agg performance depends on the structure of the graph. It is still
quadratic here:
In[6]:=
n = 10000;
pairs = Flatten[Partition[#, 2, 1]& /@ Partition[Range@ n, 20], 1];
t1 = Timing[agg[n, pairs];][[1]]
n = 20000;
pairs = Flatten[Partition[#, 2, 1]& /@ Partition[Range@ n, 20], 1];
t2 = Timing[agg[n, pairs];][[1]]
t2/t1
Out[8]= 3.735*Second
Out[11]= 17.328*Second
Out[12]= 4.6393574
Also List @@ tmp won't work for large n:
Block[{n = 50000}, agg[n, Partition[Range@ n, 2]]]
This will return SystemException.
Maxim Rytin
m.r at inbox.ru
Prev by Date:
**Re: aggregation of related elements in a list**
Next by Date:
**Re: ExportString[exp, "MathML"]**
Previous by thread:
**Re: aggregation of related elements in a list**
Next by thread:
**Re: Re: aggregation of related elements in a list**
| |