Re: Re: Re: aggregation of related elements in a list

*To*: mathgroup at smc.vnet.net*Subject*: [mg61793] Re: [mg61762] Re: [mg61730] Re: aggregation of related elements in a list*From*: danl at wolfram.com*Date*: Sun, 30 Oct 2005 00:43:22 -0400 (EDT)*References*: <djcgcs$6du$1@smc.vnet.net> <djfnjr$b1i$1@smc.vnet.net> <200510270902.FAA19490@smc.vnet.net> <200510280725.DAA08769@smc.vnet.net>*Sender*: owner-wri-mathgroup at wolfram.com

> [From Carl Woll:] > I decided to compare my approach and another which I thought up > previously with Maxim's approach in a bit more detail below. My > conclusion was that it should be possible to do things quite a bit > quicker using SparseArray objects instead of using the > ConnectedComponents function from Combinatorica. > > Maxim wrote: >> On Sun, 23 Oct 2005 10:11:07 +0000 (UTC), Carl K. Woll >> <carlw at u.washington.edu> wrote: >> >> >>>leigh pascoe wrote: >>> >>>>Dear Mathgroup, >>>> >>>>I would like to construct a function to transform a list of integers >>>>into a list of lists of related members. Suppose we have the finite >>>> list >>>> >>>>{x1,x2,x3,..........xN} >>>> >>>>and a test function ( relatedQ[x_,y_,n_] ) that decides whether two >>>>elements are related. The actual test may vary, but returns True or >>>>False for any two elements as a function of n (n is an independent >>>>variable not the number of elements in the list}. >>>> >>>>I would like to process this list to aggregate those members that are >>>>related either directly to each other or to a common element. For >>>>example if x1 is unrelated to any other element in the list, x2 is >>>>related to x5 but not x7 and x5 is related to x7 etc. would produce the >>>>list of lists >>>> >>>>{{x1},{x2,x5,x7),.......} >>>> >>>>To take a specific example, assume we have the list >>>> >>>>list={1,2,6,7,8,9,11,16} >>>> >>>>and that application of the relatedQ, test to all possible pairs yields >>>>the list of related pairs >>>> >>>>prlist={{1,2},{1,6},{2,6},{2,7},{6,7},{6,11},{7,8},{7,11},{11,16}} >>>> >>>>Using the list and the list of related pairs we can deduce the desired >>>>list of lists as >>>> >>>>{{1,2,6,7,8,11,16},{9}} >>>> >>> >>>Here's one suboptimal method >>>[...] >>>Carl Woll >>>Wolfram Research >>[From Maxim Rytin:] >> If I understand the problem correctly, this amounts to finding connected >> components in a graph -- there was a similar question asked on MathGroup >> a >> while ago. >> >> In[1]:= >> <<discretemath` >> agg[L_, $Lpair_] := Module[ >> {Lpair = $Lpair, LLind}, >> Lpair = Lpair /. Thread[L -> Range@ Length@ L]; >> LLind = ConnectedComponents[ >> FromUnorderedPairs[Lpair, CircularEmbedding[Length@ L]]]; >> Extract[L, List /@ #]& /@ LLind >> ] >> >> In[3]:= >> agg[{1, 2, 6, 7, 8, 9, 11, 16}, >> {{1, 2}, {1, 6}, {2, 6}, {2, 7}, {6, 7}, >> {6, 11}, {7, 8}, {7, 11}, {11, 16}}] >> >> Out[3]= >> {{1, 2, 6, 7, 8, 11, 16}, {9}} >> >> In[4]:= >> agg2[L_, Lpair_] := Module[ >> {n = Max@ L, sp1, sp2, t}, >> sp1 = SparseArray[Thread[Lpair -> 1], {n, n}]; >> sp2 = SparseArray[{#, #} -> 1& /@ L, {n, n}]; >> t = sp1 + sp2 + Transpose@ sp1; >> Flatten@ Position[#, 1]& /@ Union@ Normal@ >> FixedPoint[Sign[#.#]&, t, SameTest -> Equal] // >> DeleteCases[#, {}]& >> ] >> >> In[5]:= >> n = 500; >> pairs = Array[Random[Integer, {1, n}]&, {n, 2}]; >> (ans = agg[Range@ n, pairs];) // Timing >> (ans2 = agg2[Range@ n, pairs];) // Timing >> Sort@ ans === Sort@ ans2 >> >> Out[7]= {0.062 Second, Null} >> >> Out[8]= {2.469 Second, Null} >> >> Out[9]= True >> [...] >> Maxim Rytin >> m.r at inbox.ru > [From Carl Woll:] > After I proposed my solution, I realized that for input which creates > large components my solution was very inefficient, as you state above. A > simple remedy is to work with a single element, and find each component > by using matrix vector multiplication. In fact, I believe something like > this is exactly what ConnectedComponents from Combinatorica does. > > It still seems to me that using SparseArray objects ought to be quicker. > Here is your function, modified to assume that the n elements are the > integers from 1 to n, as suggested by Danny Lichtblau: > > agg0[n_, Lpair_] := Module[{LLind}, > LLind = ConnectedComponents[ > FromUnorderedPairs[Lpair, CircularEmbedding[n]]]; > Sort[Extract[Range[n], List /@ #] & /@ LLind]] > > Here is my original idea, modified to use SameTest->Equal as you mention > above: > > agg1[n_, pairs_] := Module[{sp, t}, > sp = SparseArray[Thread[pairs -> 1], {n, n}]; > t = Sign[sp + SparseArray[{i_, i_} -> 1, {n, n}] + Transpose[sp]]; > Union[nonzeros /@ FixedPoint[Sign[#.#]& , t, SameTest -> Equal]] > ] > > I use the following helper function: > > nonzeros[a_SparseArray] := a /. SparseArray[_, _, _, x_] :> > Flatten[x[[2, 2]]] > > Finally, here is a function which finds each component one at a time to > avoid SparseArray objects which aren't really sparse. Basically, the > algorithm of ConnectedComponents using SparseArray objects: > > agg2[n_, pairs_] := Module[{sp, t, candidates, nextcomp}, > sp = SparseArray[Thread[pairs -> 1], {n, n}]; > t = Sign[sp + SparseArray[{i_, i_} -> 1, {n, n}] + Transpose[sp]]; > candidates = Range[n]; > Sort[ > Reap[ > While[ > candidates != {}, > v = SparseArray[candidates[[1]] -> 1, n]; > nextcomp = nonzeros[FixedPoint[Sign[t.#1]&,v,SameTest->Equal]]; > candidates = Complement[candidates, Sow[nextcomp]]; > ]; > ][[2, 1]] > ] > ] > > Let's do some comparisons: > > In[56]:= > n=100; > p=Table[Random[Integer,{1,n}],{n},{2}]; > > In[62]:= > r1=agg0[n,p];//Timing > r2=agg1[n,p];//Timing > r3=agg2[n,p];//Timing > r1===r2===r3 > > Out[62]= > {0. Second,Null} > > Out[63]= > {0.031 Second,Null} > > Out[64]= > {0. Second,Null} > > Out[65]= > True > > Good they all agree. Now, for a larger sample set: > > In[69]:= > n=3000; > p=Table[Random[Integer,{1,n}],{n},{2}]; > > In[71]:= > r1=agg0[n,p];//Timing > (*r2=agg1[n,p];//Timing*) > r3=agg2[n,p];//Timing > r1===r3 > > Out[71]= > {1.015 Second,Null} > > Out[72]= > {0.282 Second,Null} > > Out[73]= > True > > We see that agg2 is about 4 times faster than agg0 here. I don't bother > to try agg1 because it probably will never finish. Some statistics on > the sample set: > > In[76]:= > {Max[Length/@r1],Length[r1]} > > Out[76]= > {2410,473} > [...] > [...] > Carl Woll > Wolfram Research If I admit Floyd-Warshall was overkill and far too slow for this problem, can I play again? For set of m pairs giving connections between n elements, the code below has complexty O(n+m*log(m)). One can get rid of the log factor as the sorting is not strictly needed, but in practice it is not the bottleneck and actually tends to improve speed. aggregate[n_, pairs_] := Module[ {hh, aggs, kk, ll, mm, spairs = Sort[Map[Sort, pairs]], fm}, aggs = Map[hh, Range[n]]; Do[ {kk, mm} = spairs[[j]]; ll = First[hh[kk]]; fm = First[hh[mm]]; If[fm === mm, hh[mm] = hh[ll], If[ll < fm, hh[mm] = hh[ll]; hh[First[hh[fm]]] = hh[ll], hh[ll] = hh[mm]; ll = fm]]; , {j, Length[spairs]}]; Last[Reap[Do[ll = hh[j]; Sow[j, ll], {j, n}]]] ] The idea is to use a "marker function" (hh) to record the current smallest element related to the one in question. Note that if I set hh[7] tp hh[3], and later learn that hh[3] should become hh[1] because element 3 is found to be related to element 1, then the assignment hh[3]=hh[1] suffices to change it everywhere hh[3] may be found. So there is no backtracing needed. We loop once over the pairs to create associations, and once over the list elements to put associated elements into buckets. n = 1000; pairs = Table[Random[Integer, {1, n}], {n}, {2}]; In[65]:= Sort[agg0[n,pairs]]===Sort[agg2[n,pairs]]=== Sort[aggregate[n,pairs]] Out[65]= True We check the speed: n = 10000; pairs = Table[Random[Integer, {1, n}], {n}, {2}]; In[59]:= {Timing[agg0[n,pairs];],Timing[agg2[n,pairs];], Timing[aggregate[n,pairs];]} Out[59]= {{4.593 Second,Null},{1.297 Second,Null}, {0.297 Second,Null}} Now double the size: n = 20000; pairs = Table[Random[Integer, {1, n}], {n}, {2}]; In[62]:= {Timing[agg0[n,pairs];],Timing[agg2[n,pairs];], Timing[aggregate[n,pairs];]} Out[62]= {{28.609 Second,Null},{5.657 Second,Null}, {0.656 Second,Null}} Daniel Lichtblau Wolfram Research

**Follow-Ups**:**Re: Re: Re: Re: aggregation of related elements in a list***From:*leigh pascoe <leigh@cephb.fr>

**References**:**Re: aggregation of related elements in a list***From:*Maxim <ab_def@prontomail.com>

**Re: Re: aggregation of related elements in a list***From:*"Carl K. Woll" <carlw@wolfram.com>