Re: Finding Clusters

*To*: mathgroup at smc.vnet.net*Subject*: [mg104644] Re: Finding Clusters*From*: Szabolcs Horvát <szhorvat at gmail.com>*Date*: Thu, 5 Nov 2009 03:53:23 -0500 (EST)*References*: <200911030751.CAA01018@smc.vnet.net> <hcr78s$8c4$1@smc.vnet.net>

On 2009.11.04. 7:34, Fred Simons wrote: > Here is a very short, very fast but not very simple solution: > > components[lst_List] := Module[{f}, > Do[Set @@ f /@ pair, {pair, lst}]; GatherBy[Union @@ lst, f]] > I enjoyed Fred Simons's solution tremendously. I tried to speed it up a bit. I compared the speed of components[] with the speed of WeakComponents (from the GraphUtilities package) for random graphs (e.g. RandomInteger[50000, {30000, 2}]). It seems that components[] is faster than WeakComponents for as long as the graph doesn't have very large connected components. However, as soon as large connected components appear, components[] slows down a lot. I looked into the source of WeakComponents to find out how it works, but it turns out it uses undocumented functions, such as SparseArray`StronglyConnectedComponents The reason for the slowdown of components[] when large connected components are present is that the f[] function needs to be evaluated in several steps. E.g. for the graph {{1,2},{2,3},{3,4}}, the definition of f will include f[1]=f[2], f[2]=f[3], f[3]=f[4], so the evaluation of f[1] will take 3 steps. I tried to remedy this by changing f so that it re-defines itself each time the left-hand-side of a particular definition can be evaluated further. With the above example, evaluating f[1] would cause the definition of f[1] to change from f[1]=f[2] to f[1]=f[4] (as f[2] evaluates to f[4]). Here's the solution: setSpecial[lhs_, rhs_] /; rhs =!= lhs := (lhs := With[{val = #1}, lhs := #0[val]; val] &[rhs]) components2[lst_List] := Module[{f}, Do[setSpecial @@ f /@ pair, {pair, lst}]; GatherBy[Union @@ lst, f] ] This modified components2[] seems to be faster than WeakComponents[] even for single-component random graphs, however, it is limited by $RecursionLimit (which can't be increased indefinitely without risking a crash) Szabolcs P.S. Here's the code I used to compare the speed of components[] and WeakComponents[]. For 'a' greater than about 0.5 components[] gets slow. a = 0.7; tw = Table[ g = RandomInteger[n, {Ceiling[a n], 2}]; {n, First@Timing[WeakComponents[Rule @@@ g]]}, {n, 2^Range[11, 16]} ] tc = Table[ g = RandomInteger[n, {Ceiling[a n], 2}]; {n, First@Timing[components[g]]}, {n, 2^Range[11, 16]} ] tc2 = Table[ g = RandomInteger[n, {Ceiling[a n], 2}]; {n, First@Timing[components2[g]]}, {n, 2^Range[11, 16]} ] ListLogLogPlot[{tw, tc, tc2}, Joined -> True, PlotMarkers -> Automatic]

**Follow-Ups**:**Re: Re: Finding Clusters***From:*Carl Woll <carlw@wolfram.com>

**References**:**Finding Clusters***From:*fd <fdimer@gmail.com>