       Re: Re: Re: aggregation of related elements in a list

• To: mathgroup at smc.vnet.net
• Subject: [mg61793] Re: [mg61762] Re: [mg61730] Re: aggregation of related elements in a list
• From: danl at wolfram.com
• Date: Sun, 30 Oct 2005 00:43:22 -0400 (EDT)
• References: <djcgcs\$6du\$1@smc.vnet.net> <djfnjr\$b1i\$1@smc.vnet.net> <200510270902.FAA19490@smc.vnet.net> <200510280725.DAA08769@smc.vnet.net>
• Sender: owner-wri-mathgroup at wolfram.com

```> [From Carl Woll:]
> I decided to compare my approach and another which I thought up
> previously with Maxim's approach in a bit more detail below. My
> conclusion was that it should be possible to do things quite a bit
> quicker using SparseArray objects instead of using the
> ConnectedComponents function from Combinatorica.
>
> Maxim wrote:
>> On Sun, 23 Oct 2005 10:11:07 +0000 (UTC), Carl K. Woll
>> <carlw at u.washington.edu> wrote:
>>
>>
>>>leigh pascoe wrote:
>>>
>>>>Dear Mathgroup,
>>>>
>>>>I would like to construct a function to transform a list of integers
>>>>into a list of lists of related members. Suppose we have the finite
>>>> list
>>>>
>>>>{x1,x2,x3,..........xN}
>>>>
>>>>and a test function ( relatedQ[x_,y_,n_] ) that decides whether two
>>>>elements are related. The actual test may vary, but returns True or
>>>>False for any two elements as a function of n (n is an independent
>>>>variable not the number of elements in the list}.
>>>>
>>>>I would like to process this list to aggregate those members that are
>>>>related either directly to each other or to a common element. For
>>>>example if x1 is unrelated to any other element in the list, x2 is
>>>>related to x5 but not x7 and x5 is related to x7 etc. would produce the
>>>>list of lists
>>>>
>>>>{{x1},{x2,x5,x7),.......}
>>>>
>>>>To take a specific example, assume we have the list
>>>>
>>>>list={1,2,6,7,8,9,11,16}
>>>>
>>>>and that application of the relatedQ, test to all possible pairs yields
>>>>the list of related pairs
>>>>
>>>>prlist={{1,2},{1,6},{2,6},{2,7},{6,7},{6,11},{7,8},{7,11},{11,16}}
>>>>
>>>>Using the list and the list of related pairs we can deduce the desired
>>>>list of lists as
>>>>
>>>>{{1,2,6,7,8,11,16},{9}}
>>>>
>>>
>>>Here's one suboptimal method
>>>[...]
>>>Carl Woll
>>>Wolfram Research

>>[From Maxim Rytin:]
>> If I understand the problem correctly, this amounts to finding connected
>> components in a graph -- there was a similar question asked on MathGroup
>> a
>> while ago.
>>
>> In:=
>> <<discretemath`
>> agg[L_, \$Lpair_] := Module[
>>    {Lpair = \$Lpair, LLind},
>>    Lpair = Lpair /. Thread[L -> Range@ Length@ L];
>>    LLind = ConnectedComponents[
>>      FromUnorderedPairs[Lpair, CircularEmbedding[Length@ L]]];
>>    Extract[L, List /@ #]& /@ LLind
>> ]
>>
>> In:=
>> agg[{1, 2, 6, 7, 8, 9, 11, 16},
>>      {{1, 2}, {1, 6}, {2, 6}, {2, 7}, {6, 7},
>>       {6, 11}, {7, 8}, {7, 11}, {11, 16}}]
>>
>> Out=
>> {{1, 2, 6, 7, 8, 11, 16}, {9}}
>>
>> In:=
>> agg2[L_, Lpair_] := Module[
>>    {n = Max@ L, sp1, sp2, t},
>>    sp1 = SparseArray[Thread[Lpair -> 1], {n, n}];
>>    sp2 = SparseArray[{#, #} -> 1& /@ L, {n, n}];
>>    t = sp1 + sp2 + Transpose@ sp1;
>>    Flatten@ Position[#, 1]& /@ Union@ Normal@
>>        FixedPoint[Sign[#.#]&, t, SameTest -> Equal] //
>>      DeleteCases[#, {}]&
>> ]
>>
>> In:=
>> n = 500;
>> pairs = Array[Random[Integer, {1, n}]&, {n, 2}];
>> (ans = agg[Range@ n, pairs];) // Timing
>> (ans2 = agg2[Range@ n, pairs];) // Timing
>> Sort@ ans === Sort@ ans2
>>
>> Out= {0.062 Second, Null}
>>
>> Out= {2.469 Second, Null}
>>
>> Out= True
>> [...]
>> Maxim Rytin
>> m.r at inbox.ru

> [From Carl Woll:]
> After I proposed my solution, I realized that for input which creates
> large components my solution was very inefficient, as you state above. A
> simple remedy is to work with a single element, and find each component
> by using matrix vector multiplication. In fact, I believe something like
> this is exactly what ConnectedComponents from Combinatorica does.
>
> It still seems to me that using SparseArray objects ought to be quicker.
> Here is your function, modified to assume that the n elements are the
> integers from 1 to n, as suggested by Danny Lichtblau:
>
> agg0[n_, Lpair_] := Module[{LLind},
>    LLind = ConnectedComponents[
>        FromUnorderedPairs[Lpair, CircularEmbedding[n]]];
>    Sort[Extract[Range[n], List /@ #] & /@ LLind]]
>
> Here is my original idea, modified to use SameTest->Equal as you mention
> above:
>
> agg1[n_, pairs_] := Module[{sp, t},
>    sp = SparseArray[Thread[pairs -> 1], {n, n}];
>    t = Sign[sp + SparseArray[{i_, i_} -> 1, {n, n}] + Transpose[sp]];
>    Union[nonzeros /@ FixedPoint[Sign[#.#]& , t, SameTest -> Equal]]
>    ]
>
> I use the following helper function:
>
> nonzeros[a_SparseArray] := a /. SparseArray[_, _, _, x_] :>
>    Flatten[x[[2, 2]]]
>
> Finally, here is a function which finds each component one at a time to
> avoid SparseArray objects which aren't really sparse. Basically, the
> algorithm of ConnectedComponents using SparseArray objects:
>
> agg2[n_, pairs_] := Module[{sp, t, candidates, nextcomp},
>    sp = SparseArray[Thread[pairs -> 1], {n, n}];
>    t = Sign[sp + SparseArray[{i_, i_} -> 1, {n, n}] + Transpose[sp]];
>    candidates = Range[n];
>    Sort[
>      Reap[
>        While[
>          candidates != {},
>          v = SparseArray[candidates[] -> 1, n];
>          nextcomp = nonzeros[FixedPoint[Sign[t.#1]&,v,SameTest->Equal]];
>          candidates = Complement[candidates, Sow[nextcomp]];
>          ];
>      ][[2, 1]]
>     ]
>    ]
>
> Let's do some comparisons:
>
> In:=
> n=100;
> p=Table[Random[Integer,{1,n}],{n},{2}];
>
> In:=
> r1=agg0[n,p];//Timing
> r2=agg1[n,p];//Timing
> r3=agg2[n,p];//Timing
> r1===r2===r3
>
> Out=
> {0. Second,Null}
>
> Out=
> {0.031 Second,Null}
>
> Out=
> {0. Second,Null}
>
> Out=
> True
>
> Good they all agree. Now, for a larger sample set:
>
> In:=
> n=3000;
> p=Table[Random[Integer,{1,n}],{n},{2}];
>
> In:=
> r1=agg0[n,p];//Timing
> (*r2=agg1[n,p];//Timing*)
> r3=agg2[n,p];//Timing
> r1===r3
>
> Out=
> {1.015 Second,Null}
>
> Out=
> {0.282 Second,Null}
>
> Out=
> True
>
> We see that agg2 is about 4 times faster than agg0 here. I don't bother
> to try agg1 because it probably will never finish. Some statistics on
> the sample set:
>
> In:=
> {Max[Length/@r1],Length[r1]}
>
> Out=
> {2410,473}
> [...]
> [...]
> Carl Woll
> Wolfram Research

If I admit Floyd-Warshall was overkill and far too slow for this problem,
can I play again?

For set of m pairs giving connections between n elements, the code below
has complexty O(n+m*log(m)). One can get rid of the log factor as the
sorting is not strictly needed, but in practice it is not the bottleneck
and actually tends to improve speed.

aggregate[n_, pairs_] := Module[
{hh, aggs, kk, ll, mm, spairs = Sort[Map[Sort, pairs]], fm},
aggs = Map[hh, Range[n]];
Do[
{kk, mm} = spairs[[j]];
ll = First[hh[kk]];
fm = First[hh[mm]];
If[fm === mm,
hh[mm] = hh[ll],
If[ll < fm,
hh[mm] = hh[ll];
hh[First[hh[fm]]] = hh[ll],
hh[ll] = hh[mm]; ll = fm]];
, {j, Length[spairs]}];
Last[Reap[Do[ll = hh[j]; Sow[j, ll], {j, n}]]]
]

The idea is to use a "marker function" (hh) to record the current smallest
element related to the one in question. Note that if I set hh tp hh,
and later learn that hh should become hh because element 3 is found
to be related to element 1, then the assignment hh=hh suffices to
change it everywhere hh may be found. So there is no backtracing
needed. We loop once over the pairs to create associations, and once over
the list elements to put associated elements into buckets.

n = 1000;
pairs = Table[Random[Integer, {1, n}], {n}, {2}];

In:= Sort[agg0[n,pairs]]===Sort[agg2[n,pairs]]===
Sort[aggregate[n,pairs]]
Out= True

We check the speed:

n = 10000;
pairs = Table[Random[Integer, {1, n}], {n}, {2}];

In:= {Timing[agg0[n,pairs];],Timing[agg2[n,pairs];],
Timing[aggregate[n,pairs];]}
Out= {{4.593 Second,Null},{1.297 Second,Null},
{0.297 Second,Null}}

Now double the size:

n = 20000;
pairs = Table[Random[Integer, {1, n}], {n}, {2}];

In:= {Timing[agg0[n,pairs];],Timing[agg2[n,pairs];],
Timing[aggregate[n,pairs];]}
Out= {{28.609 Second,Null},{5.657 Second,Null},
{0.656 Second,Null}}

Daniel Lichtblau
Wolfram Research

```