Re: Re: aggregation of related elements in a list

*To*: mathgroup at smc.vnet.net*Subject*: [mg61892] Re: Re: aggregation of related elements in a list*From*: Maxim <ab_def at prontomail.com>*Date*: Thu, 3 Nov 2005 04:59:13 -0500 (EST)*References*: <djcgcs$6du$1@smc.vnet.net> <djfnjr$b1i$1@smc.vnet.net> <200510270902.FAA19490@smc.vnet.net> <200510280725.DAA08769@smc.vnet.net> <200510300443.AAA09785@smc.vnet.net> <dka0du$70e$1@smc.vnet.net>*Sender*: owner-wri-mathgroup at wolfram.com

On Wed, 2 Nov 2005 09:21:02 +0000 (UTC), Carl K. Woll <carl at woll2woll.com> wrote: > > Daniel Lichtblau's algorithm is very nice, and the scaling is close to > optimal. However, I still believe that a SparseArray based approach > might be quicker. The algorithms used to manipulate SparseArray objects > are highly optimized, so tapping into these extremely quick algorithms > can be very useful. > > At any rate, I revised my old algorithm as follows: > > 1. Construct a transition matrix t. > 2. While the matrix t is not empty: > a. Purge any components which already have more than 15 elements. That > is, if any row has more than 15 elements, find all elements related to > this element by finding a fixed point using matrix vector dot products. > b. Square t > c. Purge any components which are complete. That is, if any rows were > unchanged after squaring, delete them. > d. Repeat > > Here is the code: > > agg[n_, pairs_] := > Module[{sp, t, rowcounts, oldrowcounts, component, complete, tmp}, > sp = SparseArray[Thread[pairs -> 1], {n, n}]; > t = Sign[sp + SparseArray[{i_,i_}->1, {n, n}] + Transpose[sp]]; > rowcounts = countrows[t]; > Reap[ > While[ > Total[rowcounts] > 0 (*not empty*), > While[Max[rowcounts] > 15 (*purge long components*), > component = FixedPoint[ > Sign[t . #1] & , > t[[Ordering[rowcounts, -1][[1]]]] > ]; > Sow[nonzeros[component]]; > t = sparsediagonal[1 - component] . t; > rowcounts = countrows[t] > ]; > oldrowcounts = rowcounts; > t = Sign[t . t]; (* square *) > rowcounts = countrows[t]; > complete = Sign[rowcounts] + Sign[oldrowcounts - rowcounts]; > If[Total[complete] > 0, (* purge complete rows *) > tmp = (t . sparsediagonal[Range[n]])[[nonzeros[complete]]]; > Sow /@ Union[ > List @@ tmp /. SparseArray[_, _, _, {__, x_}] :> x > ]; > t = sparsediagonal[1 - Sign[SparseArray[complete] . t]] . t; > rowcounts = countrows[t]; > ] > ] > ][[2, 1]] > ] > > countrows[s_] := s /. SparseArray[_, _, _, {_, {x_, _}, _}] :> > Differences[x] > > sparsediagonal[v_] := SparseArray[Table[{i, i}, {i, Length[v]}] -> > Normal[v]] > > nonzeros[a_] := > SparseArray[a] /. SparseArray[_, _, _, {_, {_, x_}, _}] :> Flatten[x] > > The helper function countrows counts the number of nonzero elements in > each row of a sparse matrix, sparsediagonal converts a vector into a > sparse matrix with the vector as diagonal elements, and nonzeros returns > the positions of the nonzero elements in a vector. As an example where > tapping into the sparse algorithms is helpful consider the following > timings: > > v = Table[Random[Integer], {10^6}]; > > In[53]:= > Timing[r1 = Flatten[Position[v, Except[0], 1, Heads -> False]]; ] > Timing[r2 = nonzeros[v]; ] > r1 === r2 > > Out[53]= > {1.609 Second,Null} > > Out[54]= > {0.047 Second,Null} > > Out[55]= > True > > We see that nonzeros is over 30 times faster than using Position. > > Returning to the connected components problem, here is a test set, and > comparisons of agg with agg3 (maxim) and aggregate (dan lichtblau): > > In[56]:= > SeedRandom[1]; > n =40000; > pairs = Table[RandomInteger[{1,n}],{n},{2}]; > > In[59]:= > r1=agg[n,pairs];//Timing > r2=agg3[n,pairs];//Timing > r3=aggregate[n,pairs];//Timing > Sort[Sort/@r1]===Sort[Sort/@r2]===Sort[Sort/@r3] > > Out[59]= > {1.219 Second,Null} > > Out[60]= > {2.469 Second,Null} > > Out[61]= > {2.25 Second,Null} > > Out[62]= > True > > So, agg appears to be about twice as fast for this sort of data set. > > Carl Woll > Wolfram Research > I'm guessing that Differences was defined as Differences[x_] := ListCorrelate[{-1, 1}, x] agg performance depends on the structure of the graph. It is still quadratic here: In[6]:= n = 10000; pairs = Flatten[Partition[#, 2, 1]& /@ Partition[Range@ n, 20], 1]; t1 = Timing[agg[n, pairs];][[1]] n = 20000; pairs = Flatten[Partition[#, 2, 1]& /@ Partition[Range@ n, 20], 1]; t2 = Timing[agg[n, pairs];][[1]] t2/t1 Out[8]= 3.735*Second Out[11]= 17.328*Second Out[12]= 4.6393574 Also List @@ tmp won't work for large n: Block[{n = 50000}, agg[n, Partition[Range@ n, 2]]] This will return SystemException. Maxim Rytin m.r at inbox.ru