MathGroup Archive: October 2005 [00927]

[Date Index] [Thread Index] [Author Index]

Re: Re: Re: aggregation of related elements in a list

To: mathgroup at smc.vnet.net
Subject: [mg61793] Re: [mg61762] Re: [mg61730] Re: aggregation of related elements in a list
From: danl at wolfram.com
Date: Sun, 30 Oct 2005 00:43:22 -0400 (EDT)
References: <djcgcs$6du$1@smc.vnet.net> <djfnjr$b1i$1@smc.vnet.net> <200510270902.FAA19490@smc.vnet.net> <200510280725.DAA08769@smc.vnet.net>
Sender: owner-wri-mathgroup at wolfram.com

> [From Carl Woll:]
> I decided to compare my approach and another which I thought up
> previously with Maxim's approach in a bit more detail below. My
> conclusion was that it should be possible to do things quite a bit
> quicker using SparseArray objects instead of using the
> ConnectedComponents function from Combinatorica.
>
> Maxim wrote:
>> On Sun, 23 Oct 2005 10:11:07 +0000 (UTC), Carl K. Woll
>> <carlw at u.washington.edu> wrote:
>>
>>
>>>leigh pascoe wrote:
>>>
>>>>Dear Mathgroup,
>>>>
>>>>I would like to construct a function to transform a list of integers
>>>>into a list of lists of related members. Suppose we have the finite
>>>> list
>>>>
>>>>{x1,x2,x3,..........xN}
>>>>
>>>>and a test function ( relatedQ[x_,y_,n_] ) that decides whether two
>>>>elements are related. The actual test may vary, but returns True or
>>>>False for any two elements as a function of n (n is an independent
>>>>variable not the number of elements in the list}.
>>>>
>>>>I would like to process this list to aggregate those members that are
>>>>related either directly to each other or to a common element. For
>>>>example if x1 is unrelated to any other element in the list, x2 is
>>>>related to x5 but not x7 and x5 is related to x7 etc. would produce the
>>>>list of lists
>>>>
>>>>{{x1},{x2,x5,x7),.......}
>>>>
>>>>To take a specific example, assume we have the list
>>>>
>>>>list={1,2,6,7,8,9,11,16}
>>>>
>>>>and that application of the relatedQ, test to all possible pairs yields
>>>>the list of related pairs
>>>>
>>>>prlist={{1,2},{1,6},{2,6},{2,7},{6,7},{6,11},{7,8},{7,11},{11,16}}
>>>>
>>>>Using the list and the list of related pairs we can deduce the desired
>>>>list of lists as
>>>>
>>>>{{1,2,6,7,8,11,16},{9}}
>>>>
>>>
>>>Here's one suboptimal method
>>>[...]
>>>Carl Woll
>>>Wolfram Research

>>[From Maxim Rytin:]
>> If I understand the problem correctly, this amounts to finding connected
>> components in a graph -- there was a similar question asked on MathGroup
>> a
>> while ago.
>>
>> In[1]:=
>> <<discretemath`
>> agg[L_, $Lpair_] := Module[
>>    {Lpair = $Lpair, LLind},
>>    Lpair = Lpair /. Thread[L -> Range@ Length@ L];
>>    LLind = ConnectedComponents[
>>      FromUnorderedPairs[Lpair, CircularEmbedding[Length@ L]]];
>>    Extract[L, List /@ #]& /@ LLind
>> ]
>>
>> In[3]:=
>> agg[{1, 2, 6, 7, 8, 9, 11, 16},
>>      {{1, 2}, {1, 6}, {2, 6}, {2, 7}, {6, 7},
>>       {6, 11}, {7, 8}, {7, 11}, {11, 16}}]
>>
>> Out[3]=
>> {{1, 2, 6, 7, 8, 11, 16}, {9}}
>>
>> In[4]:=
>> agg2[L_, Lpair_] := Module[
>>    {n = Max@ L, sp1, sp2, t},
>>    sp1 = SparseArray[Thread[Lpair -> 1], {n, n}];
>>    sp2 = SparseArray[{#, #} -> 1& /@ L, {n, n}];
>>    t = sp1 + sp2 + Transpose@ sp1;
>>    Flatten@ Position[#, 1]& /@ Union@ Normal@
>>        FixedPoint[Sign[#.#]&, t, SameTest -> Equal] //
>>      DeleteCases[#, {}]&
>> ]
>>
>> In[5]:=
>> n = 500;
>> pairs = Array[Random[Integer, {1, n}]&, {n, 2}];
>> (ans = agg[Range@ n, pairs];) // Timing
>> (ans2 = agg2[Range@ n, pairs];) // Timing
>> Sort@ ans === Sort@ ans2
>>
>> Out[7]= {0.062 Second, Null}
>>
>> Out[8]= {2.469 Second, Null}
>>
>> Out[9]= True
>> [...]
>> Maxim Rytin
>> m.r at inbox.ru

> [From Carl Woll:]
> After I proposed my solution, I realized that for input which creates
> large components my solution was very inefficient, as you state above. A
> simple remedy is to work with a single element, and find each component
> by using matrix vector multiplication. In fact, I believe something like
> this is exactly what ConnectedComponents from Combinatorica does.
>
> It still seems to me that using SparseArray objects ought to be quicker.
> Here is your function, modified to assume that the n elements are the
> integers from 1 to n, as suggested by Danny Lichtblau:
>
> agg0[n_, Lpair_] := Module[{LLind},
>    LLind = ConnectedComponents[
>        FromUnorderedPairs[Lpair, CircularEmbedding[n]]];
>    Sort[Extract[Range[n], List /@ #] & /@ LLind]]
>
> Here is my original idea, modified to use SameTest->Equal as you mention
> above:
>
> agg1[n_, pairs_] := Module[{sp, t},
>    sp = SparseArray[Thread[pairs -> 1], {n, n}];
>    t = Sign[sp + SparseArray[{i_, i_} -> 1, {n, n}] + Transpose[sp]];
>    Union[nonzeros /@ FixedPoint[Sign[#.#]& , t, SameTest -> Equal]]
>    ]
>
> I use the following helper function:
>
> nonzeros[a_SparseArray] := a /. SparseArray[_, _, _, x_] :>
>    Flatten[x[[2, 2]]]
>
> Finally, here is a function which finds each component one at a time to
> avoid SparseArray objects which aren't really sparse. Basically, the
> algorithm of ConnectedComponents using SparseArray objects:
>
> agg2[n_, pairs_] := Module[{sp, t, candidates, nextcomp},
>    sp = SparseArray[Thread[pairs -> 1], {n, n}];
>    t = Sign[sp + SparseArray[{i_, i_} -> 1, {n, n}] + Transpose[sp]];
>    candidates = Range[n];
>    Sort[
>      Reap[
>        While[
>          candidates != {},
>          v = SparseArray[candidates[[1]] -> 1, n];
>          nextcomp = nonzeros[FixedPoint[Sign[t.#1]&,v,SameTest->Equal]];
>          candidates = Complement[candidates, Sow[nextcomp]];
>          ];
>      ][[2, 1]]
>     ]
>    ]
>
> Let's do some comparisons:
>
> In[56]:=
> n=100;
> p=Table[Random[Integer,{1,n}],{n},{2}];
>
> In[62]:=
> r1=agg0[n,p];//Timing
> r2=agg1[n,p];//Timing
> r3=agg2[n,p];//Timing
> r1===r2===r3
>
> Out[62]=
> {0. Second,Null}
>
> Out[63]=
> {0.031 Second,Null}
>
> Out[64]=
> {0. Second,Null}
>
> Out[65]=
> True
>
> Good they all agree. Now, for a larger sample set:
>
> In[69]:=
> n=3000;
> p=Table[Random[Integer,{1,n}],{n},{2}];
>
> In[71]:=
> r1=agg0[n,p];//Timing
> (*r2=agg1[n,p];//Timing*)
> r3=agg2[n,p];//Timing
> r1===r3
>
> Out[71]=
> {1.015 Second,Null}
>
> Out[72]=
> {0.282 Second,Null}
>
> Out[73]=
> True
>
> We see that agg2 is about 4 times faster than agg0 here. I don't bother
> to try agg1 because it probably will never finish. Some statistics on
> the sample set:
>
> In[76]:=
> {Max[Length/@r1],Length[r1]}
>
> Out[76]=
> {2410,473}
> [...]
> [...]
> Carl Woll
> Wolfram Research

If I admit Floyd-Warshall was overkill and far too slow for this problem,
can I play again?

For set of m pairs giving connections between n elements, the code below
has complexty O(n+m*log(m)). One can get rid of the log factor as the
sorting is not strictly needed, but in practice it is not the bottleneck
and actually tends to improve speed.

aggregate[n_, pairs_] := Module[
  {hh, aggs, kk, ll, mm, spairs = Sort[Map[Sort, pairs]], fm},
  aggs = Map[hh, Range[n]];
  Do[
   {kk, mm} = spairs[[j]];
   ll = First[hh[kk]];
   fm = First[hh[mm]];
   If[fm === mm,
    hh[mm] = hh[ll],
    If[ll < fm,
     hh[mm] = hh[ll];
     hh[First[hh[fm]]] = hh[ll],
     hh[ll] = hh[mm]; ll = fm]];
   , {j, Length[spairs]}];
  Last[Reap[Do[ll = hh[j]; Sow[j, ll], {j, n}]]]
  ]

The idea is to use a "marker function" (hh) to record the current smallest
element related to the one in question. Note that if I set hh[7] tp hh[3],
and later learn that hh[3] should become hh[1] because element 3 is found
to be related to element 1, then the assignment hh[3]=hh[1] suffices to
change it everywhere hh[3] may be found. So there is no backtracing
needed. We loop once over the pairs to create associations, and once over
the list elements to put associated elements into buckets.

n = 1000;
pairs = Table[Random[Integer, {1, n}], {n}, {2}];

In[65]:= Sort[agg0[n,pairs]]===Sort[agg2[n,pairs]]===
  Sort[aggregate[n,pairs]]
Out[65]= True

We check the speed:

n = 10000;
pairs = Table[Random[Integer, {1, n}], {n}, {2}];

In[59]:= {Timing[agg0[n,pairs];],Timing[agg2[n,pairs];],
  Timing[aggregate[n,pairs];]}
Out[59]= {{4.593 Second,Null},{1.297 Second,Null},
  {0.297 Second,Null}}

Now double the size:

n = 20000;
pairs = Table[Random[Integer, {1, n}], {n}, {2}];

In[62]:= {Timing[agg0[n,pairs];],Timing[agg2[n,pairs];],
  Timing[aggregate[n,pairs];]}
Out[62]= {{28.609 Second,Null},{5.657 Second,Null},
  {0.656 Second,Null}}


Daniel Lichtblau
Wolfram Research

Follow-Ups:
- Re: Re: Re: Re: aggregation of related elements in a list
  - From: leigh pascoe <leigh@cephb.fr>

References:
- Re: aggregation of related elements in a list
  - From: Maxim <ab_def@prontomail.com>
- Re: Re: aggregation of related elements in a list
  - From: "Carl K. Woll" <carlw@wolfram.com>

Prev by Date: Re: Why? Warning: Actions not found: delete-next-character

Next by Date: Re: this is ridiculous...

Previous by thread: Re: Re: aggregation of related elements in a list

Next by thread: Re: Re: Re: Re: aggregation of related elements in a list