MathGroup Archive 2006

[Date Index] [Thread Index] [Author Index]

Search the Archive

Re: Faster ways to unionize intersecting sets?

  • To: mathgroup at smc.vnet.net
  • Subject: [mg70373] Re: Faster ways to unionize intersecting sets?
  • From: "dkr" <dkrjeg at adelphia.net>
  • Date: Sat, 14 Oct 2006 03:07:58 -0400 (EDT)
  • References: <eg02a3$81m$1@smc.vnet.net>

Here is a rules based approach which mimics your procedural approach:

In[1]:=
intersectQ[s1_, s2_] := If[Intersection[s1, s2] =!= {}, True, False];
mergeSetsdkr[s:{__List}]:=
    Reap[{{},Sequence@@
              s}//.{  {h_[f1___],b1_List,b2_List,b___}/;

intersectQ[b1,b2]:>{hit[f1,Union[b1,b2]],b1,b},{h_[f1___],
                b1_List,b2_List,b___} :>{h[f1,b2],b1,b},{hit[f1___],

b1_List}:>{{},f1},{{f1___},b1_List}:>(Sow[b1];{{},f1})}][[2,
        1]];

Initially, an empty list is prepended to the list s.  The first element
in the patterns will be used as a temporary storage container, and will
also keep track if the first rule is used.  There are 4 rules.  Suppose
s consists of at least two elements.  Initially the pattern h_ matches
the head List, f1 is the null sequence, with b1_ and b2_ matching the
first two lists in s, and b matching the sequence of remaining lists in
s.  If  the lists matching b1_ and b2_  intersect, then the head of the
first element is set to hit, Union[b1,b2] is added to the storage
container (whose head is now hit), and b2 is dropped from the sequence
of lists.  If there is no intersection (rule 2), then the head of the
storage container remains whatever it was, b2 is added to it and
dropped from the remaining sequence of lists.  The process continues,
using the first two rules, until only the list matching b1_ remains.
If the third rule matches (whose first element has the head hit), it
means that somewhere along the line the list matching b1 intersected
another list that appeared next to it.  (Rule 1 is the only rule that
could have changed the first element's head from List to hit.)
Consequently the list matching b1_ is not a single.  In this case, the
first element is reset to {}, and the sequence of lists f1 is recovered
from the storage container, and the process continues.  Otherwise the
fourth rule applies, whose first element has the head List
(consequently the first rule could not have been used), and thus the
list matching b1 is a single, and so is sown.

Below we give a few timing results comparing your approach, the above
approach, and the elegant rules based approach suggested by Peter Pein
in this thread.

In[3]:=
mergeSets[s_List] := Module[
 {h, r, singles, club, cnt},
 cnt = Length[s];
 If[cnt < 2, Return[s]];
 singles = {};
 club = s;
While[cnt >= 2,
 h = club[[1]];
 r = Rest[club];
 hit = 0;
 club = If[intersectQ[h, #], hit = 1; Union[h, #], #] & /@ r;
 If[hit == 0, singles = Append[singles, h]];
 --cnt;
 ];
 Join[singles, club]
 ] ;

mergeSetsPEIN[s_List]:=
    s//.{a___,x_List,b___,y_List,c___}/;
          Intersection[x,y]=!={}\[RuleDelayed]{a,Union[x,y],b,c};

In[5]:=
SeedRandom[1234];
generateData[numUpperBound_,sublistLengthLowerBound_,sublistLengthUpperBound_,
      numSublists_]:=
    Table[Table[
        Random[Integer,{1,numUpperBound}],{Random[
            Integer,{sublistLengthLowerBound,
              sublistLengthUpperBound}]}],{numSublists}];
data1=generateData[1000,1,20,100];
data2=generateData[1000,1,10,100];
data3=generateData[1000,1,20,200];
data4=generateData[1000,1,5,200];
data5=generateData[1000,1,80,50];
data6=generateData[1000,50,200,50];

In[13]:=
First/@{Timing[mergeSets[data1]],Timing[mergeSetsdkr[data1]],
    Timing[mergeSetsPEIN[data1]]}
Out[13]=
{1.99 Second,1.77 Second,1.47 Second}
In[14]:=
First/@{Timing[mergeSets[data2]],Timing[mergeSetsdkr[data2]],
    Timing[mergeSetsPEIN[data2]]}
Out[14]=
{0.53 Second,0.3 Second,0.38 Second}
In[15]:=
First/@{Timing[mergeSets[data3]],Timing[mergeSetsdkr[data3]],
    Timing[mergeSetsPEIN[data3]]}
Out[15]=
{11.67 Second,11.88 Second,3.15 Second}
In[16]:=
First/@{Timing[mergeSets[data4]],Timing[mergeSetsdkr[data4]],
    Timing[mergeSetsPEIN[data4]]}
Out[16]=
{0.83 Second,0.83 Second,2.26 Second}
In[17]:=
First/@{Timing[mergeSets[data5]],Timing[mergeSetsdkr[data5]],
    Timing[mergeSetsPEIN[data5]]}
Out[17]=
{1.14 Second,0.89 Second,3.13 Second}
In[18]:=
First/@{Timing[mergeSets[data6]],Timing[mergeSetsdkr[data6]],
    Timing[mergeSetsPEIN[data6]]}
Out[18]=
{1.63 Second,1.37 Second,4.15 Second}

dkr


  • Prev by Date: Re: IntervalComplement
  • Next by Date: Convert expression to polynomial
  • Previous by thread: Re: Re: Faster ways to unionize intersecting sets?
  • Next by thread: Re: Faster ways to unionize intersecting sets?