MathGroup Archive 2005

[Date Index] [Thread Index] [Author Index]

Search the Archive

Re: aggregation of related elements in a list

  • To: mathgroup at smc.vnet.net
  • Subject: [mg61703] Re: [mg61542] aggregation of related elements in a list
  • From: Daniel Lichtblau <danl at wolfram.com>
  • Date: Wed, 26 Oct 2005 01:02:05 -0400 (EDT)
  • References: <200510220435.AAA06197@smc.vnet.net>
  • Sender: owner-wri-mathgroup at wolfram.com

leigh pascoe wrote:
> Dear Mathgroup,
> 
> I would like to construct a function to transform a list of integers 
> into a list of lists of related members. Suppose we have the finite list
> 
> {x1,x2,x3,..........xN}
> 
> and a test function ( relatedQ[x_,y_,n_] ) that decides whether two 
> elements are related. The actual test may vary, but returns True or 
> False for any two elements as a function of n (n is an independent 
> variable not the number of elements in the list}.
> 
> I would like to process this list to aggregate those members that are 
> related either directly to each other or to a common element. For 
> example if x1 is unrelated to any other element in the list, x2 is 
> related to x5 but not x7 and x5 is related to x7 etc. would produce the 
> list of lists
> 
> {{x1},{x2,x5,x7),.......}
> 
> To take a specific example, assume we have the list
> 
> list={1,2,6,7,8,9,11,16}
> 
> and that application of the relatedQ, test to all possible pairs yields 
> the list of related pairs
> 
> prlist={{1,2},{1,6},{2,6},{2,7},{6,7},{6,11},{7,8},{7,11},{11,16}}
> 
> Using the list and the list of related pairs we can deduce the desired 
> list of lists as
> 
> {{1,2,6,7,8,11,16},{9}}
> 
> If 6 were not in the original list we would get instead
> 
> {{1,2},{7,8,11,16},{9}}
> 
> I have tried to develop a Mathematica function or procedure to produce 
> this list, but am having trouble achieving it. Can anyone help?
> 
> The algorithm I have been trying to implement is:
> - form a list of all related pairs and singletons in the original list 
> by comparing all pairs to prlist
> - then compare the first element in the resulting list with each of the 
> others in that list and form the Union of all those where (Intersection 
> !={})
> - delete all pairs retained for the Union.
> - repeat the process for the next element in the list and so on.
> 
> The following ugly code seems to work on some lists, but with others 
> gives me an error message, presumably an objection to a dynamic 
> redefinition of the list inside a Do statement.
> 
> In[87]:=
> chains[n_]:=Block[{lst,tmp,tmp2,lst2},
>     lst={1,2,6,7,8,9,11,16};
>     tmp=Flatten[
>         Part[Reap[
>             Do[{If[Intersection[{{Part[lst,i],Part[lst,j]}},
>                       Edges[gr1[n]]]\[NotEqual]{},
>                   
> Sow[{Part[lst,i],Part[lst,j]}]],Sow},{i,1,Length[lst]-1},{j,
>                 i+1,Length[lst]}]],2],1];
>     tmp2=Part[
>         Reap[Do[If[Intersection[{lst[[i]]},Flatten[tmp]]\[Equal]{},
>               Sow[lst[[i]]]],{i,1,Length[lst]}]],2];
>     lst2=Union[tmp,tmp2];
>     Do
>       [
>       {
>         If
>           [
>           Intersection[lst2[[i]],lst2[[j]]]\[NotEqual]{},
>           {lst2[[i]]=Union[lst2[[i]],lst2[[j]]],lst2=Delete[lst2,j]}
>           ]
>         },
>       {i,1,Length[lst2]-1},
>       {j,i+1,Length[lst2]}
>       ];
>     lst2
>     ]
> chains[4]
> I have been doing much Intersecting, Unions, Deletions, nested loops and 
> Sowing, but have so far been unable to Reap the result that I want! 
> There should be a simple recursive function that will do what I want, 
> but I can't find it.
> 
> Thanks for any help
> 
> LP

This amounts to finding the transitive closure of the connectivity 
matrix. Here are a couple of ways to do this. We start with the 
assumption that you have n elements, represented as the integers 
Range[n]. If not, simple replacement rules can be used to make this the 
case.

(1) You can add powers of the connectivity matrix, up to n (or until 
there are no new path connections). This is the idea behind Carl Woll's 
approach. It also takes advantage of sparse matrix multiplication; in 
cases where there are many disjoint connectivity classes this can be 
advantageous. But at worst it is O(n^4), where n is the number of 
elements. When finished, just look at the Sign of entries because we 
only want to know if a connection exists, not how many paths between 
elements there might be.

(1b) Instead one can work with the dense matrix but using approximate 
arithmetic. Using a diagonalization by eigensystem it is easy to sum 
powers of the amtrix without forming explicit matrix products. The 
drawback to this approach is that, in order to make it work, I had to 
resort to increasing precision, so it is generally no better (and indeed 
is possibly worse) than method 1 implemented with exact matrix products.

aggregateList[n_, pairs_] := Module[
   {mat, vals, vecs, vals2, v2, iv2, mat2},
   mat = Normal[SparseArray[Thread[pairs->1], {n,n}]];
   mat = Sign[(mat + Transpose[mat])];
   Do[mat[[j,j]] = 0, {j,n}];
   {vals,vecs} = Eigensystem[N[mat,2*n/3]];
   vals = Chop[vals,10^(-n/4)];
   v2 = Conjugate[Transpose[vecs]];
   iv2 = Inverse[v2];
   vals2 = 1+Sum[(vals)^j,{j,1,n}];
   mat2 = v2.(vals2*iv2);
   mat2 = Sign[Chop[mat2,10^(-2)]];
   Union[Map[Flatten[Position[#,1]]&, mat2]]
   ]

As an example I take 100 elements and form 100 random connections 
between pairs.

In[355]:= n = 100;

In[357]:= InputForm[pairs = Table[Random[Integer,{1,n}], {n}, {2}]]
Out[357]//InputForm=
{{1, 80}, {27, 39}, {65, 13}, {41, 46}, {18, 90}, {35, 87}, {87, 81},
  {67, 28}, {74, 95}, {17, 68}, {16, 40}, {24, 91}, {47, 30}, {53, 21},
  {68, 76}, {39, 20}, {1, 31}, {77, 66}, {82, 94}, {17, 56}, {7, 63},
  {54, 33}, {82, 28}, {3, 15}, {12, 12}, {7, 80}, {61, 74}, {89, 49},
  {81, 98}, {1, 72}, {83, 87}, {63, 42}, {1, 37}, {94, 4}, {87, 54},
  {45, 34},
  {40, 42}, {80, 86}, {2, 70}, {15, 14}, {94, 90}, {2, 100}, {24, 58},
  {43, 98}, {73, 32}, {7, 90}, {12, 39}, {49, 64}, {38, 23}, {61, 21},
  {73, 9}, {29, 77}, {59, 26}, {13, 53}, {30, 77}, {56, 10}, {46, 1},
  {100, 88}, {38, 78}, {33, 50}, {38, 41}, {43, 63}, {26, 23}, {2, 51},
  {68, 33}, {38, 15}, {85, 90}, {51, 82}, {74, 39}, {41, 5}, {11, 33},
  {68, 45}, {3, 87}, {97, 99}, {40, 48}, {19, 88}, {33, 24}, {15, 64},
  {47, 67}, {49, 24}, {86, 90}, {25, 89}, {76, 78}, {66, 22}, {75, 9},
  {28, 2}, {58, 58}, {63, 60}, {99, 33}, {91, 16}, {50, 32}, {82, 89},
  {6, 17}, {17, 36}, {30, 98}, {71, 13}, {71, 77}, {19, 19}, {3, 82},
  {38, 30}}

In[358]:= Timing[classes1 = aggregateList[n, pairs]]
Out[358]= {12.3581 Second, {{8}, {44}, {52}, {55}, {57}, {62},
   {69}, {79},
     {84}, {92}, {93}, {96}, {1, 2, 3, 4, 5, 6, 7, 9, 10, 11, 12,
   13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27,
   28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42,
   43, 45, 46, 47, 48, 49, 50, 51, 53, 54, 56, 58, 59, 60,
   61, 63, 64, 65, 66, 67, 68, 70, 71, 72, 73, 74, 75, 76, 77,
   78, 80, 81, 82, 83, 85, 86, 87, 88, 89, 90,
   91, 94, 95, 97, 98, 99, 100}}}

As a sanity check we note that there are exactly 100 elements in total 
across all equivalence classes.

In[359]:= Apply[Plus,Map[Length,classes1]]
Out[359]= 100

(2) Use a modification of the Floyd-Warshall shortest path algorithm. 
This is O(n^3) in complexity. It has other advantages e.g. it can be 
used for directed, weighted graphs. Below is a version I coded for the 
purpose at hand. I use Compile for speed gain, andpost process outside 
the Compile because the result is not a tensorial array.

FloydWarshallC = Compile[
   {{n,_Integer},{graph,_Integer,2}},
   Module[{dist, i1, i2},
   dist = Table[n+1, {n}, {n}];
   Do [i1 = graph[[j,1]]; i2 = graph[[j,2]];
     dist[[i1,i2]] = 1; dist[[i2,i1]] = 1;
	,{j,Length[graph]}];
   Do[dist[[j,j]] = 1, {j,n}];
   Do [dist[[i,j]] = Min[dist[[i,j]], dist[[i,k]] + dist[[k,j]]];
	, {k,n}, {i,n}, {j,n}];
   Sign[dist-(n+1)]
   ]]

aggregateFloydWarshall[n_,graph_] :=
   Union[Map[Flatten[Position[#,-1]]&, FloydWarshallC[n,graph]]]

In[372]:= InputForm[Timing[classes2 = aggregateFloydWarshall[n, pairs]]]
Out[372]//InputForm=
{1.0738360000000036*Second, {{8}, {44}, {52}, {55}, {57},
  {62}, {69}, {79},
   {84}, {92}, {93}, {96}, {1, 2, 3, 4, 5, 6, 7, 9, 10,
    11, 12, 13, 14, 15,
    16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28,
    29, 30, 31, 32, 33,
    34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 45, 46, 47,
    48, 49, 50, 51, 53,
    54, 56, 58, 59, 60, 61, 63, 64, 65, 66, 67, 68, 70,
    71, 72, 73, 74, 75,
    76, 77, 78, 80, 81, 82, 83, 85, 86, 87, 88, 89, 90,
    91, 94, 95, 97, 98, 99, 100}}}

Compare with the previous result (who do you believe, me or your lyin' 
eyes?).

In[373]:= classes1 === classes2
Out[373]= True

Let's see if the complexity is in the right ballpark.

In[374]:= n = 300;

In[375]:= pairs300 = Table[Random[Integer,{1,n}], {n}, {2}];

In[376]:= Timing[classes300 = aggregateFloydWarshall[n, pairs300];]
Out[376]= {28.5507 Second, Null}

We tripled the size and the time increased by a factor of 26.6, quite 
close to 3^3.


Daniel Lichtblau
Wolfram Research


  • Prev by Date: Re: Re: JLink discover and link to an already running kernel from a Java App
  • Next by Date: IMAP interface to Mathematica
  • Previous by thread: aggregation of related elements in a list
  • Next by thread: Re: aggregation of related elements in a list