[Date Index]
[Thread Index]
[Author Index]
Re: Finding Clusters
*To*: mathgroup at smc.vnet.net
*Subject*: [mg104698] Re: Finding Clusters
*From*: Leonid Shifrin <lshifr at gmail.com>
*Date*: Sat, 7 Nov 2009 06:47:58 -0500 (EST)
*References*: <200911030751.CAA01018@smc.vnet.net>
Hi all,
I came pretty late to the party, but let me add my voice to those delighted
by Fred's solution and Szabolcs's improvement. These are two of the most
brilliant and mind-blowing hacks I ever saw.
I would just like to add to this, that let us not underestimate the power of
Compile. Here is an implementation of weight-balancing path-compression
union-find algorithm taken straight from the Sedgewick's book, with some
minor changes as needed for adoption to Mathematica:
Clear[getTree];
getTree =
Compile[{{pairs, _Integer, 2}},
Module[{t = 0, i = 0, j = 0, xl = 0, yl = 0, k = 0,
len = Max[pairs], dad = Table[0, {Max[pairs]}],
present = Table[-1, {Max[pairs]}]},
For[k = 1, k <= Length[pairs], k++,
xl = i = pairs[[k, 1]];
yl = j = pairs[[k, 2]];
present[[i]] = present[[j]] = 1;
If[xl == yl, Continue[]];
While[dad[[i]] > 0, i = dad[[i]]];
While[dad[[j]] > 0, j = dad[[j]]];
While[dad[[xl]] > 0, t = xl; xl = dad[[xl]]; dad[[t]] = i];
While[dad[[yl]] > 0, t = yl; yl = dad[[yl]]; dad[[t]] = j];
If[i != j,
If[dad[[j]] <= dad[[i]],
If[dad[[i]] > 0, dad[[j]] = dad[[j]] + dad[[i]] - 1];
dad[[i]] = j;,
(* else *)
If[dad[[j]] > 0, (dad[[i]] = dad[[i]] + dad[[j]] - 1)];
dad[[j]] = i];
];
];
For[k = 1, k <= len, k++,
If[dad[[k]] == 0, dad[[k]] = k]];
{dad, present}]];
Clear[getComponents];
getComponents[pairs_] :=
Module[{dad, present},
{dad, present} = getTree[pairs];
Select[
GatherBy[
Transpose[{FixedPoint[dad[[#]] &, dad], Range[Length[dad]]}],
First][[All, All, 2]],
Length[#] > 1 || present[[First@#]] == 1 &]];
It would need less code and run faster yet if we didn't allow same-vertex
pairs like {1,1}. It is also limited by memory consumption - it allocates
the list of dimension equal to the maximum vertex label, so it will be a
waste if vertex set is a sparse array of large numbers. Also, vertex labels
must be strictly positive. Both of the last two limitations can be dealt
with by some sort of re-labeling - I did not include that.
Here is Szabolcs's comparison where I added timing for my version and did
some cosmetic changes (not that zero vertex numbers are not generated,
unlike the original comparison of Szabolcs):
In[1]:=
Clear[res1,res2,res0,i0,i1,i2];
i1=0;i2=0;i0=0;
In[2]:=
data = Table[RandomInteger[{1,n},{Ceiling[a n],2}],{n,2^Range[11,16]}];
In[3]:=
tc=(First@Timing[res0[i0++]=components[#]]&)/@data
Out[3]= {0.032,0.078,0.297,1.156,4.547,19.562}
In[4]:=
tc2=(First@Timing[res1[i1++]=components2[#]]&)/@data
Out[4]= {0.047,0.109,0.235,0.5,1.046,2.36}
In[5]:=
tc3=(First@Timing[res2[i2++]=getComponents[#]]&)/@data
Out[5]= {0.,0.031,0.031,0.078,0.188,0.391}
In[6]:= ListLogLogPlot[{tc,tc2,tc3},Joined->True,PlotMarkers->Automatic]
In[7]:=
res0[#]===res1[#]===res2[#]&/@Range[0,5]
Out[7]= {True,True,True,True,True,True}
Here <components> corresponds to Fred's solution, and <components2> to
Szabolcs's.
I have a feeling that the present compiled version and Szabolcs's one are
doing similar type of things (at least in spirit), by attempting to flatten
the emerging element trees at run-time. The speed-up achieved by compiled
version I attribute to the large time constant associated with hash
setting/lookup (assignments and DownValue definition lookup as compared to
compiled version of array indexing), plus very high efficiency of Part when
many elements are extracted at once - the fact that I exploit in the
FixedPoint construct.
Regards,
Leonid
On Fri, Nov 6, 2009 at 1:17 PM, Fred Simons <f.h.simons at tue.nl> wrote:
> I feel I have to remark that with respect to me there is nothing
> brilliant in the solution I posted. The idea behind it was the result of
> a discussion in this group, many, many years ago, on a similar problem.
> But anyway, it is a very beautiful result!
>
> The real brilliant thing is the improvement that was given by Szabolcs
> Horvat in [mg104644].
>
> Fred
>
>
> DrMajorBob wrote:
> > Brilliant as usual, Fred.
> >
> > I did think intervals only needed to overlap to "correspond", whereas
> your
> > solution requires them to share an end-point.
> >
> > For instance, in this example the first interval is a subset of the
> second
> > interval and overlaps with the third, yet "components' associates it with
> > neither.
> >
> > event = {{1, 2} + 1/2, {1, 3}, {3, 4}, {5, 6}, {7, 8}, {8, 10}};
> > components@event
> >
> > {{1, 3, 4}, {3/2, 5/2}, {5, 6}, {7, 8, 10}}
> >
> > I can't really guess the OP's intent.
> >
> > Bobby
> >
> > On Wed, 04 Nov 2009 00:33:25 -0600, Fred Simons <f.h.simons at tue.nl>
> wrote:
> >
> >
> >> Here is a very short, very fast but not very simple solution:
> >>
> >> components[lst_List] := Module[{f},
> >> Do[Set @@ f /@ pair, {pair, lst}]; GatherBy[Union @@ lst, f]]
> >>
> >> Fred Simons
> >> Eindhoven University of Technology
> >>
> >>> All.
> >>>
> >>> I have a list which represents some natural event. These events are
> >>> listed pair-wise, which corresponds to event happening within certain
> >>> time interval from each other, as below
> >>>
> >>> event={{1,2},{1,3},{3,4},{5,6},{7,8},{8,10}}
> >>>
> >>> I wish to find a thread of events, i.e. if event A is related to B and
> >>> B to C, I wish to group {A,B,C} together. For the example above I
> >>> would have
> >>>
> >>> {{1,2,3,4},{5,6},{7,8,10}}
> >>>
> >>> This would correspond to do a Graph Plot and identifying the parts
> >>> which are disconnected It should be simple but I'm really finding it
> >>> troublesome.
> >>>
> >>>
> >>>
> >>>
> >
> >
> >
>
>
>
Prev by Date:
**Re: How to make floating control elements?**
Next by Date:
**Re: find subsets**
Previous by thread:
**Re: Re: Finding Clusters**
Next by thread:
**Re: Finding Clusters**
| |