Re: Re: Finding Clusters

*To*: mathgroup at smc.vnet.net*Subject*: [mg104729] Re: [mg104644] Re: Finding Clusters*From*: Carl Woll <carlw at wolfram.com>*Date*: Sun, 8 Nov 2009 06:45:52 -0500 (EST)*References*: <200911030751.CAA01018@smc.vnet.net> <hcr78s$8c4$1@smc.vnet.net> <200911050853.DAA21759@smc.vnet.net>

Szabolcs Horvát wrote: > On 2009.11.04. 7:34, Fred Simons wrote: > >> Here is a very short, very fast but not very simple solution: >> >> components[lst_List] := Module[{f}, >> Do[Set @@ f /@ pair, {pair, lst}]; GatherBy[Union @@ lst, f]] >> >> > > I enjoyed Fred Simons's solution tremendously. > > I tried to speed it up a bit. > > I compared the speed of components[] with the speed of WeakComponents > (from the GraphUtilities package) for random graphs (e.g. > RandomInteger[50000, {30000, 2}]). It seems that components[] is faster > than WeakComponents for as long as the graph doesn't have very large > connected components. However, as soon as large connected components > appear, components[] slows down a lot. > > I looked into the source of WeakComponents to find out how it works, but > it turns out it uses undocumented functions, such as > SparseArray`StronglyConnectedComponents > > The reason for the slowdown of components[] when large connected > components are present is that the f[] function needs to be evaluated in > several steps. E.g. for the graph {{1,2},{2,3},{3,4}}, the definition > of f will include f[1]=f[2], f[2]=f[3], f[3]=f[4], so the evaluation of > f[1] will take 3 steps. > > I tried to remedy this by changing f so that it re-defines itself each > time the left-hand-side of a particular definition can be evaluated > further. With the above example, evaluating f[1] would cause the > definition of f[1] to change from f[1]=f[2] to f[1]=f[4] (as f[2] > evaluates to f[4]). Here's the solution: > > setSpecial[lhs_, rhs_] /; rhs =!= lhs := > (lhs := With[{val = #1}, lhs := #0[val]; val] &[rhs]) > > components2[lst_List] := > Module[{f}, > Do[setSpecial @@ f /@ pair, {pair, lst}]; > GatherBy[Union @@ lst, f] > ] > > This modified components2[] seems to be faster than WeakComponents[] > even for single-component random graphs, however, it is limited by > $RecursionLimit (which can't be increased indefinitely without risking a > crash) > > Szabolcs > > P.S. Here's the code I used to compare the speed of components[] and > WeakComponents[]. For 'a' greater than about 0.5 components[] gets slow. > > a = 0.7; > > tw = Table[ > g = RandomInteger[n, {Ceiling[a n], 2}]; > {n, First@Timing[WeakComponents[Rule @@@ g]]}, > {n, 2^Range[11, 16]} > ] > > tc = Table[ > g = RandomInteger[n, {Ceiling[a n], 2}]; > {n, First@Timing[components[g]]}, > {n, 2^Range[11, 16]} > ] > > tc2 = Table[ > g = RandomInteger[n, {Ceiling[a n], 2}]; > {n, First@Timing[components2[g]]}, > {n, 2^Range[11, 16]} > ] > > ListLogLogPlot[{tw, tc, tc2}, Joined -> True, > PlotMarkers -> Automatic] > > Another late to the party post. I think this topic was discussed back in 2005, and I think the quickest solution then was found in my post:* http://tinyurl.com/ylon3hr* Anyway, the solution was: aggs[n_, pairs_] := Module[{sp, t}, sp = SparseArray[Thread[pairs -> 1], {n, n}]; t = Sign[sp + Transpose[sp]]; SparseArray`StronglyConnectedComponents[t]] Here, the pairs argument needs to be a list of pairs of positive integers, and n is the maximum of these integers. A quick comparison with components2 follows: In[51]:= g = RandomInteger[{1, 10^4}, {7000, 2}]; r1 = components2[g]; // Timing r2 = aggs[10^4, g]; // Timing Sort[Sort /@ DeleteCases[r1, {_}]] === Sort[Sort /@ DeleteCases[r2, {_}]] Out[52]= {0.499, Null} Out[53]= {0.016, Null} Out[54]= True So, about 30 times faster. Carl Woll Wolfram Research

**References**:**Finding Clusters***From:*fd <fdimer@gmail.com>

**Re: Finding Clusters***From:*Szabolcs Horvát <szhorvat@gmail.com>