MathGroup Archive 2005

[Date Index] [Thread Index] [Author Index]

Search the Archive

Re: Re: Re: Re: aggregation of related elements in a list

danl at wrote:


>If I admit Floyd-Warshall was overkill and far too slow for this problem,
>can I play again?
>For set of m pairs giving connections between n elements, the code below
>has complexty O(n+m*log(m)). One can get rid of the log factor as the
>sorting is not strictly needed, but in practice it is not the bottleneck
>and actually tends to improve speed.
>aggregate[n_, pairs_] := Module[
>  {hh, aggs, kk, ll, mm, spairs = Sort[Map[Sort, pairs]], fm},
>  aggs = Map[hh, Range[n]];
>  Do[
>   {kk, mm} = spairs[[j]];
>   ll = First[hh[kk]];
>   fm = First[hh[mm]];
>   If[fm === mm,
>    hh[mm] = hh[ll],
>    If[ll < fm,
>     hh[mm] = hh[ll];
>     hh[First[hh[fm]]] = hh[ll],
>     hh[ll] = hh[mm]; ll = fm]];
>   , {j, Length[spairs]}];
>  Last[Reap[Do[ll = hh[j]; Sow[j, ll], {j, n}]]]
>  ]
>The idea is to use a "marker function" (hh) to record the current smallest
>element related to the one in question. Note that if I set hh[7] tp hh[3],
>and later learn that hh[3] should become hh[1] because element 3 is found
>to be related to element 1, then the assignment hh[3]=hh[1] suffices to
>change it everywhere hh[3] may be found. So there is no backtracing
>needed. We loop once over the pairs to create associations, and once over
>the list elements to put associated elements into buckets.
>n = 1000;
>pairs = Table[Random[Integer, {1, n}], {n}, {2}];
>In[65]:= Sort[agg0[n,pairs]]===Sort[agg2[n,pairs]]===
>  Sort[aggregate[n,pairs]]
>Out[65]= True
>We check the speed:
>n = 10000;
>pairs = Table[Random[Integer, {1, n}], {n}, {2}];
>In[59]:= {Timing[agg0[n,pairs];],Timing[agg2[n,pairs];],
>  Timing[aggregate[n,pairs];]}
>Out[59]= {{4.593 Second,Null},{1.297 Second,Null},
>  {0.297 Second,Null}}
>Now double the size:
>n = 20000;
>pairs = Table[Random[Integer, {1, n}], {n}, {2}];
>In[62]:= {Timing[agg0[n,pairs];],Timing[agg2[n,pairs];],
>  Timing[aggregate[n,pairs];]}
>Out[62]= {{28.609 Second,Null},{5.657 Second,Null},
>  {0.656 Second,Null}}
>Daniel Lichtblau
>Wolfram Research
I very much like this solution, which follows what one might do 
manually, and is also efficient. It resembles the Flood-fill algorithm, 
which is O(n^2), but by some Ma magic is only linear in n. Using the 
existing function ConnectedComponents has the virtue of clarity from the 
point of view of reading a program. In any case thanks again to all who 
responded to my question.

Surely there will be no further improvements??


  • Prev by Date: pattern matching question
  • Next by Date: Re: Exporting notebooks to xml (cnxml)
  • Previous by thread: Re: Re: Re: aggregation of related elements in a list
  • Next by thread: Re: Re: aggregation of related elements in a list