Re: Finding Clusters
- To: mathgroup at smc.vnet.net
- Subject: [mg104746] Re: Finding Clusters
- From: Maxim <m.r at inbox.ru>
- Date: Sun, 8 Nov 2009 06:49:23 -0500 (EST)
- References: <200911030751.CAA01018@smc.vnet.net> <hcr78s$8c4$1@smc.vnet.net>
On Nov 5, 3:22 am, Szabolcs Horv=E1t <szhor... at gmail.com> wrote:
> On 2009.11.04. 7:34, Fred Simons wrote:
>
> > Here is a very short, very fast but not very simple solution:
>
> > components[lst_List] := Module[{f},
> > Do[Set @@ f /@ pair, {pair, lst}]; GatherBy[Union @@ lst, f]=
]
>
> I enjoyed Fred Simons's solution tremendously.
>
> I tried to speed it up a bit.
>
> I compared the speed of components[] with the speed of WeakComponents
> (from the GraphUtilities package) for random graphs (e.g.
> RandomInteger[50000, {30000, 2}]). It seems that components[] is faste=
r
> than WeakComponents for as long as the graph doesn't have very large
> connected components. However, as soon as large connected components
> appear, components[] slows down a lot.
>
> I looked into the source of WeakComponents to find out how it works, but
> it turns out it uses undocumented functions, such as
> SparseArray`StronglyConnectedComponents
>
> The reason for the slowdown of components[] when large connected
> components are present is that the f[] function needs to be evaluated in
> several steps. E.g. for the graph {{1,2},{2,3},{3,4}}, the definition
> of f will include f[1]=f[2], f[2]=f[3], f[3]=f[4], so the evaluatio=
n of
> f[1] will take 3 steps.
>
> I tried to remedy this by changing f so that it re-defines itself each
> time the left-hand-side of a particular definition can be evaluated
> further. With the above example, evaluating f[1] would cause the
> definition of f[1] to change from f[1]=f[2] to f[1]=f[4] (as f[2]
> evaluates to f[4]). Here's the solution:
>
> setSpecial[lhs_, rhs_] /; rhs =!= lhs :=
> (lhs := With[{val = #1}, lhs := #0[val]; val] &[rhs])
>
> components2[lst_List] :=
> Module[{f},
> Do[setSpecial @@ f /@ pair, {pair, lst}];
> GatherBy[Union @@ lst, f]
> ]
>
> This modified components2[] seems to be faster than WeakComponents[]
> even for single-component random graphs, however, it is limited by
> $RecursionLimit (which can't be increased indefinitely without risking a
> crash)
>
> Szabolcs
>
> P.S. Here's the code I used to compare the speed of components[] and
> WeakComponents[]. For 'a' greater than about 0.5 components[] gets slo=
w.
>
> a = 0.7;
>
> tw = Table[
> g = RandomInteger[n, {Ceiling[a n], 2}];
> {n, First@Timing[WeakComponents[Rule @@@ g]]},
> {n, 2^Range[11, 16]}
> ]
>
> tc = Table[
> g = RandomInteger[n, {Ceiling[a n], 2}];
> {n, First@Timing[components[g]]},
> {n, 2^Range[11, 16]}
> ]
>
> tc2 = Table[
> g = RandomInteger[n, {Ceiling[a n], 2}];
> {n, First@Timing[components2[g]]},
> {n, 2^Range[11, 16]}
> ]
>
> ListLogLogPlot[{tw, tc, tc2}, Joined -> True,
> PlotMarkers -> Automatic]
The WeakComponents solution can be improved because it spends most of
the time generating the adjacency matrix. This is significantly
faster:
DeleteCases[#, {_}] &@
WeakComponents[SparseArray[data -> 1, {Max@data, Max@data}]]
Maxim Rytin
m.r at inbox.ru
- References:
- Finding Clusters
- From: fd <fdimer@gmail.com>
- Finding Clusters