Re: aggregation of related elements in a list

*To*: mathgroup at smc.vnet.net*Subject*: [mg61703] Re: [mg61542] aggregation of related elements in a list*From*: Daniel Lichtblau <danl at wolfram.com>*Date*: Wed, 26 Oct 2005 01:02:05 -0400 (EDT)*References*: <200510220435.AAA06197@smc.vnet.net>*Sender*: owner-wri-mathgroup at wolfram.com

leigh pascoe wrote: > Dear Mathgroup, > > I would like to construct a function to transform a list of integers > into a list of lists of related members. Suppose we have the finite list > > {x1,x2,x3,..........xN} > > and a test function ( relatedQ[x_,y_,n_] ) that decides whether two > elements are related. The actual test may vary, but returns True or > False for any two elements as a function of n (n is an independent > variable not the number of elements in the list}. > > I would like to process this list to aggregate those members that are > related either directly to each other or to a common element. For > example if x1 is unrelated to any other element in the list, x2 is > related to x5 but not x7 and x5 is related to x7 etc. would produce the > list of lists > > {{x1},{x2,x5,x7),.......} > > To take a specific example, assume we have the list > > list={1,2,6,7,8,9,11,16} > > and that application of the relatedQ, test to all possible pairs yields > the list of related pairs > > prlist={{1,2},{1,6},{2,6},{2,7},{6,7},{6,11},{7,8},{7,11},{11,16}} > > Using the list and the list of related pairs we can deduce the desired > list of lists as > > {{1,2,6,7,8,11,16},{9}} > > If 6 were not in the original list we would get instead > > {{1,2},{7,8,11,16},{9}} > > I have tried to develop a Mathematica function or procedure to produce > this list, but am having trouble achieving it. Can anyone help? > > The algorithm I have been trying to implement is: > - form a list of all related pairs and singletons in the original list > by comparing all pairs to prlist > - then compare the first element in the resulting list with each of the > others in that list and form the Union of all those where (Intersection > !={}) > - delete all pairs retained for the Union. > - repeat the process for the next element in the list and so on. > > The following ugly code seems to work on some lists, but with others > gives me an error message, presumably an objection to a dynamic > redefinition of the list inside a Do statement. > > In[87]:= > chains[n_]:=Block[{lst,tmp,tmp2,lst2}, > lst={1,2,6,7,8,9,11,16}; > tmp=Flatten[ > Part[Reap[ > Do[{If[Intersection[{{Part[lst,i],Part[lst,j]}}, > Edges[gr1[n]]]\[NotEqual]{}, > > Sow[{Part[lst,i],Part[lst,j]}]],Sow},{i,1,Length[lst]-1},{j, > i+1,Length[lst]}]],2],1]; > tmp2=Part[ > Reap[Do[If[Intersection[{lst[[i]]},Flatten[tmp]]\[Equal]{}, > Sow[lst[[i]]]],{i,1,Length[lst]}]],2]; > lst2=Union[tmp,tmp2]; > Do > [ > { > If > [ > Intersection[lst2[[i]],lst2[[j]]]\[NotEqual]{}, > {lst2[[i]]=Union[lst2[[i]],lst2[[j]]],lst2=Delete[lst2,j]} > ] > }, > {i,1,Length[lst2]-1}, > {j,i+1,Length[lst2]} > ]; > lst2 > ] > chains[4] > I have been doing much Intersecting, Unions, Deletions, nested loops and > Sowing, but have so far been unable to Reap the result that I want! > There should be a simple recursive function that will do what I want, > but I can't find it. > > Thanks for any help > > LP This amounts to finding the transitive closure of the connectivity matrix. Here are a couple of ways to do this. We start with the assumption that you have n elements, represented as the integers Range[n]. If not, simple replacement rules can be used to make this the case. (1) You can add powers of the connectivity matrix, up to n (or until there are no new path connections). This is the idea behind Carl Woll's approach. It also takes advantage of sparse matrix multiplication; in cases where there are many disjoint connectivity classes this can be advantageous. But at worst it is O(n^4), where n is the number of elements. When finished, just look at the Sign of entries because we only want to know if a connection exists, not how many paths between elements there might be. (1b) Instead one can work with the dense matrix but using approximate arithmetic. Using a diagonalization by eigensystem it is easy to sum powers of the amtrix without forming explicit matrix products. The drawback to this approach is that, in order to make it work, I had to resort to increasing precision, so it is generally no better (and indeed is possibly worse) than method 1 implemented with exact matrix products. aggregateList[n_, pairs_] := Module[ {mat, vals, vecs, vals2, v2, iv2, mat2}, mat = Normal[SparseArray[Thread[pairs->1], {n,n}]]; mat = Sign[(mat + Transpose[mat])]; Do[mat[[j,j]] = 0, {j,n}]; {vals,vecs} = Eigensystem[N[mat,2*n/3]]; vals = Chop[vals,10^(-n/4)]; v2 = Conjugate[Transpose[vecs]]; iv2 = Inverse[v2]; vals2 = 1+Sum[(vals)^j,{j,1,n}]; mat2 = v2.(vals2*iv2); mat2 = Sign[Chop[mat2,10^(-2)]]; Union[Map[Flatten[Position[#,1]]&, mat2]] ] As an example I take 100 elements and form 100 random connections between pairs. In[355]:= n = 100; In[357]:= InputForm[pairs = Table[Random[Integer,{1,n}], {n}, {2}]] Out[357]//InputForm= {{1, 80}, {27, 39}, {65, 13}, {41, 46}, {18, 90}, {35, 87}, {87, 81}, {67, 28}, {74, 95}, {17, 68}, {16, 40}, {24, 91}, {47, 30}, {53, 21}, {68, 76}, {39, 20}, {1, 31}, {77, 66}, {82, 94}, {17, 56}, {7, 63}, {54, 33}, {82, 28}, {3, 15}, {12, 12}, {7, 80}, {61, 74}, {89, 49}, {81, 98}, {1, 72}, {83, 87}, {63, 42}, {1, 37}, {94, 4}, {87, 54}, {45, 34}, {40, 42}, {80, 86}, {2, 70}, {15, 14}, {94, 90}, {2, 100}, {24, 58}, {43, 98}, {73, 32}, {7, 90}, {12, 39}, {49, 64}, {38, 23}, {61, 21}, {73, 9}, {29, 77}, {59, 26}, {13, 53}, {30, 77}, {56, 10}, {46, 1}, {100, 88}, {38, 78}, {33, 50}, {38, 41}, {43, 63}, {26, 23}, {2, 51}, {68, 33}, {38, 15}, {85, 90}, {51, 82}, {74, 39}, {41, 5}, {11, 33}, {68, 45}, {3, 87}, {97, 99}, {40, 48}, {19, 88}, {33, 24}, {15, 64}, {47, 67}, {49, 24}, {86, 90}, {25, 89}, {76, 78}, {66, 22}, {75, 9}, {28, 2}, {58, 58}, {63, 60}, {99, 33}, {91, 16}, {50, 32}, {82, 89}, {6, 17}, {17, 36}, {30, 98}, {71, 13}, {71, 77}, {19, 19}, {3, 82}, {38, 30}} In[358]:= Timing[classes1 = aggregateList[n, pairs]] Out[358]= {12.3581 Second, {{8}, {44}, {52}, {55}, {57}, {62}, {69}, {79}, {84}, {92}, {93}, {96}, {1, 2, 3, 4, 5, 6, 7, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 45, 46, 47, 48, 49, 50, 51, 53, 54, 56, 58, 59, 60, 61, 63, 64, 65, 66, 67, 68, 70, 71, 72, 73, 74, 75, 76, 77, 78, 80, 81, 82, 83, 85, 86, 87, 88, 89, 90, 91, 94, 95, 97, 98, 99, 100}}} As a sanity check we note that there are exactly 100 elements in total across all equivalence classes. In[359]:= Apply[Plus,Map[Length,classes1]] Out[359]= 100 (2) Use a modification of the Floyd-Warshall shortest path algorithm. This is O(n^3) in complexity. It has other advantages e.g. it can be used for directed, weighted graphs. Below is a version I coded for the purpose at hand. I use Compile for speed gain, andpost process outside the Compile because the result is not a tensorial array. FloydWarshallC = Compile[ {{n,_Integer},{graph,_Integer,2}}, Module[{dist, i1, i2}, dist = Table[n+1, {n}, {n}]; Do [i1 = graph[[j,1]]; i2 = graph[[j,2]]; dist[[i1,i2]] = 1; dist[[i2,i1]] = 1; ,{j,Length[graph]}]; Do[dist[[j,j]] = 1, {j,n}]; Do [dist[[i,j]] = Min[dist[[i,j]], dist[[i,k]] + dist[[k,j]]]; , {k,n}, {i,n}, {j,n}]; Sign[dist-(n+1)] ]] aggregateFloydWarshall[n_,graph_] := Union[Map[Flatten[Position[#,-1]]&, FloydWarshallC[n,graph]]] In[372]:= InputForm[Timing[classes2 = aggregateFloydWarshall[n, pairs]]] Out[372]//InputForm= {1.0738360000000036*Second, {{8}, {44}, {52}, {55}, {57}, {62}, {69}, {79}, {84}, {92}, {93}, {96}, {1, 2, 3, 4, 5, 6, 7, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 45, 46, 47, 48, 49, 50, 51, 53, 54, 56, 58, 59, 60, 61, 63, 64, 65, 66, 67, 68, 70, 71, 72, 73, 74, 75, 76, 77, 78, 80, 81, 82, 83, 85, 86, 87, 88, 89, 90, 91, 94, 95, 97, 98, 99, 100}}} Compare with the previous result (who do you believe, me or your lyin' eyes?). In[373]:= classes1 === classes2 Out[373]= True Let's see if the complexity is in the right ballpark. In[374]:= n = 300; In[375]:= pairs300 = Table[Random[Integer,{1,n}], {n}, {2}]; In[376]:= Timing[classes300 = aggregateFloydWarshall[n, pairs300];] Out[376]= {28.5507 Second, Null} We tripled the size and the time increased by a factor of 26.6, quite close to 3^3. Daniel Lichtblau Wolfram Research

**References**:**aggregation of related elements in a list***From:*leigh pascoe <leigh@cephb.fr>