MathGroup Archive 2009

[Date Index] [Thread Index] [Author Index]

Search the Archive

Re: Re: Finding Clusters


Szabolcs Horvát wrote:
> On 2009.11.04. 7:34, Fred Simons wrote:
>   
>> Here is a very short, very fast but not very simple solution:
>>
>> components[lst_List] := Module[{f},
>>    Do[Set @@ f /@ pair, {pair, lst}];   GatherBy[Union @@ lst, f]]
>>
>>     
>
> I enjoyed Fred Simons's solution tremendously.
>
> I tried to speed it up a bit.
>
> I compared the speed of components[] with the speed of WeakComponents 
> (from the GraphUtilities package) for random graphs (e.g. 
> RandomInteger[50000, {30000, 2}]).  It seems that components[] is faster 
> than WeakComponents for as long as the graph doesn't have very large 
> connected components.  However, as soon as large connected components 
> appear, components[] slows down a lot.
>
> I looked into the source of WeakComponents to find out how it works, but 
> it turns out it uses undocumented functions, such as
> SparseArray`StronglyConnectedComponents
>
> The reason for the slowdown of components[] when large connected 
> components are present is that the f[] function needs to be evaluated in 
> several steps.  E.g. for the graph {{1,2},{2,3},{3,4}}, the definition 
> of f will include f[1]=f[2], f[2]=f[3], f[3]=f[4], so the evaluation of 
> f[1] will take 3 steps.
>
> I tried to remedy this by changing f so that it re-defines itself each 
> time the left-hand-side of a particular definition can be evaluated 
> further.  With the above example, evaluating f[1] would cause the 
> definition of f[1] to change from f[1]=f[2] to f[1]=f[4] (as f[2] 
> evaluates to f[4]).  Here's the solution:
>
> setSpecial[lhs_, rhs_] /; rhs =!= lhs :=
>    (lhs := With[{val = #1}, lhs := #0[val]; val] &[rhs])
>
> components2[lst_List] :=
>   Module[{f},
>    Do[setSpecial @@ f /@ pair, {pair, lst}];
>    GatherBy[Union @@ lst, f]
>   ]
>
> This modified components2[] seems to be faster than WeakComponents[] 
> even for single-component random graphs, however, it is limited by 
> $RecursionLimit (which can't be increased indefinitely without risking a 
> crash)
>
> Szabolcs
>
> P.S. Here's the code I used to compare the speed of components[] and 
> WeakComponents[].  For 'a' greater than about 0.5 components[] gets slow.
>
> a = 0.7;
>
> tw = Table[
>    g = RandomInteger[n, {Ceiling[a n], 2}];
>    {n, First@Timing[WeakComponents[Rule @@@ g]]},
>    {n, 2^Range[11, 16]}
>    ]
>
> tc = Table[
>    g = RandomInteger[n, {Ceiling[a n], 2}];
>    {n, First@Timing[components[g]]},
>    {n, 2^Range[11, 16]}
>    ]
>
> tc2 = Table[
>    g = RandomInteger[n, {Ceiling[a n], 2}];
>    {n, First@Timing[components2[g]]},
>    {n, 2^Range[11, 16]}
>    ]
>
> ListLogLogPlot[{tw, tc, tc2}, Joined -> True,
>   PlotMarkers -> Automatic]
>
>   
Another late to the party post. I think this topic was discussed back in 
2005, and I think the quickest solution then was found in my post:*

http://tinyurl.com/ylon3hr*

Anyway, the solution was:

aggs[n_, pairs_] := Module[{sp, t},
     sp = SparseArray[Thread[pairs -> 1], {n, n}];
     t = Sign[sp + Transpose[sp]];
     SparseArray`StronglyConnectedComponents[t]]

Here, the pairs argument needs to be a list of pairs of positive 
integers, and n is the maximum of these integers. A quick comparison 
with components2 follows:

In[51]:= g = RandomInteger[{1, 10^4}, {7000, 2}];

r1 = components2[g]; // Timing
r2 = aggs[10^4, g]; // Timing

Sort[Sort /@ DeleteCases[r1, {_}]] ===  Sort[Sort /@ DeleteCases[r2, {_}]]


Out[52]= {0.499, Null}

Out[53]= {0.016, Null}

Out[54]= True

So, about 30 times faster.

Carl Woll
Wolfram Research


  • Prev by Date: Re: Finding Clusters
  • Next by Date: Re: Re: Finding Clusters
  • Previous by thread: Re: Finding Clusters
  • Next by thread: Re: Finding Clusters