MathGroup Archive: November 2005 [00061]

[Date Index] [Thread Index] [Author Index]

Re: Re: aggregation of related elements in a list

To: mathgroup at smc.vnet.net
Subject: [mg61892] Re: Re: aggregation of related elements in a list
From: Maxim <ab_def at prontomail.com>
Date: Thu, 3 Nov 2005 04:59:13 -0500 (EST)
References: <djcgcs$6du$1@smc.vnet.net> <djfnjr$b1i$1@smc.vnet.net> <200510270902.FAA19490@smc.vnet.net> <200510280725.DAA08769@smc.vnet.net> <200510300443.AAA09785@smc.vnet.net> <dka0du$70e$1@smc.vnet.net>
Sender: owner-wri-mathgroup at wolfram.com

On Wed, 2 Nov 2005 09:21:02 +0000 (UTC), Carl K. Woll <carl at woll2woll.com>  
wrote:

>
> Daniel Lichtblau's algorithm is very nice, and the scaling is close to
> optimal. However, I still believe that a SparseArray based approach
> might be quicker. The algorithms used to manipulate SparseArray objects
> are highly optimized, so tapping into these extremely quick algorithms
> can be very useful.
>
> At any rate, I revised my old algorithm as follows:
>
> 1. Construct a transition matrix t.
> 2. While the matrix t is not empty:
> a. Purge any components which already have more than 15 elements. That
> is, if any row has more than 15 elements, find all elements related to
> this element by finding a fixed point using matrix vector dot products.
> b. Square t
> c. Purge any components which are complete. That is, if any rows were
> unchanged after squaring, delete them.
> d. Repeat
>
> Here is the code:
>
> agg[n_, pairs_] :=
>   Module[{sp, t, rowcounts, oldrowcounts, component, complete, tmp},
>      sp = SparseArray[Thread[pairs -> 1], {n, n}];
>      t = Sign[sp + SparseArray[{i_,i_}->1, {n, n}] + Transpose[sp]];
>      rowcounts = countrows[t];
>      Reap[
>      While[
>       Total[rowcounts] > 0 (*not empty*),
>       While[Max[rowcounts] > 15 (*purge long components*),
>        component = FixedPoint[
>          Sign[t . #1] & ,
>          t[[Ordering[rowcounts, -1][[1]]]]
>          ];
>        Sow[nonzeros[component]];
>        t = sparsediagonal[1 - component] . t;
>        rowcounts = countrows[t]
>        ];
>       oldrowcounts = rowcounts;
>       t = Sign[t . t]; (* square *)
>       rowcounts = countrows[t];
>       complete = Sign[rowcounts] + Sign[oldrowcounts - rowcounts];
>       If[Total[complete] > 0, (* purge complete rows *)
>        tmp = (t . sparsediagonal[Range[n]])[[nonzeros[complete]]];
>        Sow /@ Union[
>          List @@ tmp /. SparseArray[_, _, _, {__, x_}] :> x
>          ];
>        t = sparsediagonal[1 - Sign[SparseArray[complete] . t]] . t;
>        rowcounts = countrows[t];
>        ]
>       ]
>      ][[2, 1]]
>    ]
>
> countrows[s_] := s /. SparseArray[_, _, _, {_, {x_, _}, _}] :>
> Differences[x]
>
> sparsediagonal[v_] := SparseArray[Table[{i, i}, {i, Length[v]}] ->
> Normal[v]]
>
> nonzeros[a_] :=
>   SparseArray[a] /. SparseArray[_, _, _, {_, {_, x_}, _}] :> Flatten[x]
>
> The helper function countrows counts the number of nonzero elements in
> each row of a sparse matrix, sparsediagonal converts a vector into a
> sparse matrix with the vector as diagonal elements, and nonzeros returns
> the positions of the nonzero elements in a vector. As an example where
> tapping into the sparse algorithms is helpful consider the following
> timings:
>
> v = Table[Random[Integer], {10^6}];
>
> In[53]:=
> Timing[r1 = Flatten[Position[v, Except[0], 1, Heads -> False]]; ]
> Timing[r2 = nonzeros[v]; ]
> r1 === r2
>
> Out[53]=
> {1.609 Second,Null}
>
> Out[54]=
> {0.047 Second,Null}
>
> Out[55]=
> True
>
> We see that nonzeros is over 30 times faster than using Position.
>
> Returning to the connected components problem, here is a test set, and
> comparisons of agg with agg3 (maxim) and aggregate (dan lichtblau):
>
> In[56]:=
> SeedRandom[1];
> n =40000;
> pairs = Table[RandomInteger[{1,n}],{n},{2}];
>
> In[59]:=
> r1=agg[n,pairs];//Timing
> r2=agg3[n,pairs];//Timing
> r3=aggregate[n,pairs];//Timing
> Sort[Sort/@r1]===Sort[Sort/@r2]===Sort[Sort/@r3]
>
> Out[59]=
> {1.219 Second,Null}
>
> Out[60]=
> {2.469 Second,Null}
>
> Out[61]=
> {2.25 Second,Null}
>
> Out[62]=
> True
>
> So, agg appears to be about twice as fast for this sort of data set.
>
> Carl Woll
> Wolfram Research
>

I'm guessing that Differences was defined as

Differences[x_] := ListCorrelate[{-1, 1}, x]

agg performance depends on the structure of the graph. It is still  
quadratic here:

In[6]:=
n = 10000;
pairs = Flatten[Partition[#, 2, 1]& /@ Partition[Range@ n, 20], 1];
t1 = Timing[agg[n, pairs];][[1]]
n = 20000;
pairs = Flatten[Partition[#, 2, 1]& /@ Partition[Range@ n, 20], 1];
t2 = Timing[agg[n, pairs];][[1]]
t2/t1

Out[8]= 3.735*Second

Out[11]= 17.328*Second

Out[12]= 4.6393574

Also List @@ tmp won't work for large n:

Block[{n = 50000}, agg[n, Partition[Range@ n, 2]]]

This will return SystemException.

Maxim Rytin
m.r at inbox.ru

Prev by Date: Re: aggregation of related elements in a list

Next by Date: Re: ExportString[exp, "MathML"]

Previous by thread: Re: aggregation of related elements in a list

Next by thread: Re: Re: aggregation of related elements in a list