MathGroup Archive 2009

[Date Index] [Thread Index] [Author Index]

Search the Archive

Re: Finding Clusters

  • To: mathgroup at smc.vnet.net
  • Subject: [mg104644] Re: Finding Clusters
  • From: Szabolcs Horvát <szhorvat at gmail.com>
  • Date: Thu, 5 Nov 2009 03:53:23 -0500 (EST)
  • References: <200911030751.CAA01018@smc.vnet.net> <hcr78s$8c4$1@smc.vnet.net>

On 2009.11.04. 7:34, Fred Simons wrote:
> Here is a very short, very fast but not very simple solution:
>
> components[lst_List] := Module[{f},
>    Do[Set @@ f /@ pair, {pair, lst}];   GatherBy[Union @@ lst, f]]
>

I enjoyed Fred Simons's solution tremendously.

I tried to speed it up a bit.

I compared the speed of components[] with the speed of WeakComponents 
(from the GraphUtilities package) for random graphs (e.g. 
RandomInteger[50000, {30000, 2}]).  It seems that components[] is faster 
than WeakComponents for as long as the graph doesn't have very large 
connected components.  However, as soon as large connected components 
appear, components[] slows down a lot.

I looked into the source of WeakComponents to find out how it works, but 
it turns out it uses undocumented functions, such as
SparseArray`StronglyConnectedComponents

The reason for the slowdown of components[] when large connected 
components are present is that the f[] function needs to be evaluated in 
several steps.  E.g. for the graph {{1,2},{2,3},{3,4}}, the definition 
of f will include f[1]=f[2], f[2]=f[3], f[3]=f[4], so the evaluation of 
f[1] will take 3 steps.

I tried to remedy this by changing f so that it re-defines itself each 
time the left-hand-side of a particular definition can be evaluated 
further.  With the above example, evaluating f[1] would cause the 
definition of f[1] to change from f[1]=f[2] to f[1]=f[4] (as f[2] 
evaluates to f[4]).  Here's the solution:

setSpecial[lhs_, rhs_] /; rhs =!= lhs :=
   (lhs := With[{val = #1}, lhs := #0[val]; val] &[rhs])

components2[lst_List] :=
  Module[{f},
   Do[setSpecial @@ f /@ pair, {pair, lst}];
   GatherBy[Union @@ lst, f]
  ]

This modified components2[] seems to be faster than WeakComponents[] 
even for single-component random graphs, however, it is limited by 
$RecursionLimit (which can't be increased indefinitely without risking a 
crash)

Szabolcs

P.S. Here's the code I used to compare the speed of components[] and 
WeakComponents[].  For 'a' greater than about 0.5 components[] gets slow.

a = 0.7;

tw = Table[
   g = RandomInteger[n, {Ceiling[a n], 2}];
   {n, First@Timing[WeakComponents[Rule @@@ g]]},
   {n, 2^Range[11, 16]}
   ]

tc = Table[
   g = RandomInteger[n, {Ceiling[a n], 2}];
   {n, First@Timing[components[g]]},
   {n, 2^Range[11, 16]}
   ]

tc2 = Table[
   g = RandomInteger[n, {Ceiling[a n], 2}];
   {n, First@Timing[components2[g]]},
   {n, 2^Range[11, 16]}
   ]

ListLogLogPlot[{tw, tc, tc2}, Joined -> True,
  PlotMarkers -> Automatic]


  • Prev by Date: Re: Problems with absolute PlotRange
  • Next by Date: Re: what's wrong with these expressions?
  • Previous by thread: Re: Re: Re: Finding Clusters
  • Next by thread: Re: Re: Finding Clusters