Re: aggregation of related elements in a list

• To: mathgroup at smc.vnet.net
• Subject: [mg61703] Re: [mg61542] aggregation of related elements in a list
• From: Daniel Lichtblau <danl at wolfram.com>
• Date: Wed, 26 Oct 2005 01:02:05 -0400 (EDT)
• References: <200510220435.AAA06197@smc.vnet.net>
• Sender: owner-wri-mathgroup at wolfram.com

```leigh pascoe wrote:
> Dear Mathgroup,
>
> I would like to construct a function to transform a list of integers
> into a list of lists of related members. Suppose we have the finite list
>
> {x1,x2,x3,..........xN}
>
> and a test function ( relatedQ[x_,y_,n_] ) that decides whether two
> elements are related. The actual test may vary, but returns True or
> False for any two elements as a function of n (n is an independent
> variable not the number of elements in the list}.
>
> I would like to process this list to aggregate those members that are
> related either directly to each other or to a common element. For
> example if x1 is unrelated to any other element in the list, x2 is
> related to x5 but not x7 and x5 is related to x7 etc. would produce the
> list of lists
>
> {{x1},{x2,x5,x7),.......}
>
> To take a specific example, assume we have the list
>
> list={1,2,6,7,8,9,11,16}
>
> and that application of the relatedQ, test to all possible pairs yields
> the list of related pairs
>
> prlist={{1,2},{1,6},{2,6},{2,7},{6,7},{6,11},{7,8},{7,11},{11,16}}
>
> Using the list and the list of related pairs we can deduce the desired
> list of lists as
>
> {{1,2,6,7,8,11,16},{9}}
>
> If 6 were not in the original list we would get instead
>
> {{1,2},{7,8,11,16},{9}}
>
> I have tried to develop a Mathematica function or procedure to produce
> this list, but am having trouble achieving it. Can anyone help?
>
> The algorithm I have been trying to implement is:
> - form a list of all related pairs and singletons in the original list
> by comparing all pairs to prlist
> - then compare the first element in the resulting list with each of the
> others in that list and form the Union of all those where (Intersection
> !={})
> - delete all pairs retained for the Union.
> - repeat the process for the next element in the list and so on.
>
> The following ugly code seems to work on some lists, but with others
> gives me an error message, presumably an objection to a dynamic
> redefinition of the list inside a Do statement.
>
> In[87]:=
> chains[n_]:=Block[{lst,tmp,tmp2,lst2},
>     lst={1,2,6,7,8,9,11,16};
>     tmp=Flatten[
>         Part[Reap[
>             Do[{If[Intersection[{{Part[lst,i],Part[lst,j]}},
>                       Edges[gr1[n]]]\[NotEqual]{},
>
> Sow[{Part[lst,i],Part[lst,j]}]],Sow},{i,1,Length[lst]-1},{j,
>                 i+1,Length[lst]}]],2],1];
>     tmp2=Part[
>         Reap[Do[If[Intersection[{lst[[i]]},Flatten[tmp]]\[Equal]{},
>               Sow[lst[[i]]]],{i,1,Length[lst]}]],2];
>     lst2=Union[tmp,tmp2];
>     Do
>       [
>       {
>         If
>           [
>           Intersection[lst2[[i]],lst2[[j]]]\[NotEqual]{},
>           {lst2[[i]]=Union[lst2[[i]],lst2[[j]]],lst2=Delete[lst2,j]}
>           ]
>         },
>       {i,1,Length[lst2]-1},
>       {j,i+1,Length[lst2]}
>       ];
>     lst2
>     ]
> chains[4]
> I have been doing much Intersecting, Unions, Deletions, nested loops and
> Sowing, but have so far been unable to Reap the result that I want!
> There should be a simple recursive function that will do what I want,
> but I can't find it.
>
> Thanks for any help
>
> LP

This amounts to finding the transitive closure of the connectivity
matrix. Here are a couple of ways to do this. We start with the
assumption that you have n elements, represented as the integers
Range[n]. If not, simple replacement rules can be used to make this the
case.

(1) You can add powers of the connectivity matrix, up to n (or until
there are no new path connections). This is the idea behind Carl Woll's
approach. It also takes advantage of sparse matrix multiplication; in
cases where there are many disjoint connectivity classes this can be
advantageous. But at worst it is O(n^4), where n is the number of
elements. When finished, just look at the Sign of entries because we
only want to know if a connection exists, not how many paths between
elements there might be.

(1b) Instead one can work with the dense matrix but using approximate
arithmetic. Using a diagonalization by eigensystem it is easy to sum
powers of the amtrix without forming explicit matrix products. The
drawback to this approach is that, in order to make it work, I had to
resort to increasing precision, so it is generally no better (and indeed
is possibly worse) than method 1 implemented with exact matrix products.

aggregateList[n_, pairs_] := Module[
{mat, vals, vecs, vals2, v2, iv2, mat2},
mat = Sign[(mat + Transpose[mat])];
Do[mat[[j,j]] = 0, {j,n}];
{vals,vecs} = Eigensystem[N[mat,2*n/3]];
vals = Chop[vals,10^(-n/4)];
v2 = Conjugate[Transpose[vecs]];
iv2 = Inverse[v2];
vals2 = 1+Sum[(vals)^j,{j,1,n}];
mat2 = v2.(vals2*iv2);
mat2 = Sign[Chop[mat2,10^(-2)]];
Union[Map[Flatten[Position[#,1]]&, mat2]]
]

As an example I take 100 elements and form 100 random connections
between pairs.

In[355]:= n = 100;

In[357]:= InputForm[pairs = Table[Random[Integer,{1,n}], {n}, {2}]]
Out[357]//InputForm=
{{1, 80}, {27, 39}, {65, 13}, {41, 46}, {18, 90}, {35, 87}, {87, 81},
{67, 28}, {74, 95}, {17, 68}, {16, 40}, {24, 91}, {47, 30}, {53, 21},
{68, 76}, {39, 20}, {1, 31}, {77, 66}, {82, 94}, {17, 56}, {7, 63},
{54, 33}, {82, 28}, {3, 15}, {12, 12}, {7, 80}, {61, 74}, {89, 49},
{81, 98}, {1, 72}, {83, 87}, {63, 42}, {1, 37}, {94, 4}, {87, 54},
{45, 34},
{40, 42}, {80, 86}, {2, 70}, {15, 14}, {94, 90}, {2, 100}, {24, 58},
{43, 98}, {73, 32}, {7, 90}, {12, 39}, {49, 64}, {38, 23}, {61, 21},
{73, 9}, {29, 77}, {59, 26}, {13, 53}, {30, 77}, {56, 10}, {46, 1},
{100, 88}, {38, 78}, {33, 50}, {38, 41}, {43, 63}, {26, 23}, {2, 51},
{68, 33}, {38, 15}, {85, 90}, {51, 82}, {74, 39}, {41, 5}, {11, 33},
{68, 45}, {3, 87}, {97, 99}, {40, 48}, {19, 88}, {33, 24}, {15, 64},
{47, 67}, {49, 24}, {86, 90}, {25, 89}, {76, 78}, {66, 22}, {75, 9},
{28, 2}, {58, 58}, {63, 60}, {99, 33}, {91, 16}, {50, 32}, {82, 89},
{6, 17}, {17, 36}, {30, 98}, {71, 13}, {71, 77}, {19, 19}, {3, 82},
{38, 30}}

In[358]:= Timing[classes1 = aggregateList[n, pairs]]
Out[358]= {12.3581 Second, {{8}, {44}, {52}, {55}, {57}, {62},
{69}, {79},
{84}, {92}, {93}, {96}, {1, 2, 3, 4, 5, 6, 7, 9, 10, 11, 12,
13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27,
28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42,
43, 45, 46, 47, 48, 49, 50, 51, 53, 54, 56, 58, 59, 60,
61, 63, 64, 65, 66, 67, 68, 70, 71, 72, 73, 74, 75, 76, 77,
78, 80, 81, 82, 83, 85, 86, 87, 88, 89, 90,
91, 94, 95, 97, 98, 99, 100}}}

As a sanity check we note that there are exactly 100 elements in total
across all equivalence classes.

In[359]:= Apply[Plus,Map[Length,classes1]]
Out[359]= 100

(2) Use a modification of the Floyd-Warshall shortest path algorithm.
This is O(n^3) in complexity. It has other advantages e.g. it can be
used for directed, weighted graphs. Below is a version I coded for the
purpose at hand. I use Compile for speed gain, andpost process outside
the Compile because the result is not a tensorial array.

FloydWarshallC = Compile[
{{n,_Integer},{graph,_Integer,2}},
Module[{dist, i1, i2},
dist = Table[n+1, {n}, {n}];
Do [i1 = graph[[j,1]]; i2 = graph[[j,2]];
dist[[i1,i2]] = 1; dist[[i2,i1]] = 1;
,{j,Length[graph]}];
Do[dist[[j,j]] = 1, {j,n}];
Do [dist[[i,j]] = Min[dist[[i,j]], dist[[i,k]] + dist[[k,j]]];
, {k,n}, {i,n}, {j,n}];
Sign[dist-(n+1)]
]]

aggregateFloydWarshall[n_,graph_] :=
Union[Map[Flatten[Position[#,-1]]&, FloydWarshallC[n,graph]]]

In[372]:= InputForm[Timing[classes2 = aggregateFloydWarshall[n, pairs]]]
Out[372]//InputForm=
{1.0738360000000036*Second, {{8}, {44}, {52}, {55}, {57},
{62}, {69}, {79},
{84}, {92}, {93}, {96}, {1, 2, 3, 4, 5, 6, 7, 9, 10,
11, 12, 13, 14, 15,
16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28,
29, 30, 31, 32, 33,
34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 45, 46, 47,
48, 49, 50, 51, 53,
54, 56, 58, 59, 60, 61, 63, 64, 65, 66, 67, 68, 70,
71, 72, 73, 74, 75,
76, 77, 78, 80, 81, 82, 83, 85, 86, 87, 88, 89, 90,
91, 94, 95, 97, 98, 99, 100}}}

Compare with the previous result (who do you believe, me or your lyin'
eyes?).

In[373]:= classes1 === classes2
Out[373]= True

Let's see if the complexity is in the right ballpark.

In[374]:= n = 300;

In[375]:= pairs300 = Table[Random[Integer,{1,n}], {n}, {2}];

In[376]:= Timing[classes300 = aggregateFloydWarshall[n, pairs300];]
Out[376]= {28.5507 Second, Null}

We tripled the size and the time increased by a factor of 26.6, quite
close to 3^3.

Daniel Lichtblau
Wolfram Research

```

• Prev by Date: Re: Re: JLink discover and link to an already running kernel from a Java App
• Next by Date: IMAP interface to Mathematica
• Previous by thread: aggregation of related elements in a list
• Next by thread: Re: aggregation of related elements in a list