MathGroup Archive: December 2002 [00151]

[Date Index] [Thread Index] [Author Index]
RE: Re: Efficient Sorting Algorithm
To: mathgroup at smc.vnet.net
Subject: [mg38217] RE: [mg38165] Re: Efficient Sorting Algorithm
From: "Wolf, Hartmut" <Hartmut.Wolf at t-systems.com>
Date: Thu, 5 Dec 2002 03:31:54 -0500 (EST)
Sender: owner-wri-mathgroup at wolfram.com
Dear Allan,

I was quite astonished about the performance of your proposed algorithm, as
I also was about the misperformance of mine (I had not tested, just believed
Mathematica will do. But introducing a test in Intersection appears to kill
its performance). Nonetheless we might combine ideas to get something still
faster.

(To keep all this legible I will snip out in the following part of your
deduction and examples)


>-----Original Message-----
>From: Allan Hayes [mailto:hay at haystack.demon.co.uk]
To: mathgroup at smc.vnet.net
>Sent: Wednesday, December 04, 2002 9:25 AM
>To: mathgroup at smc.vnet.net
>Subject: [mg38217] [mg38165] Re: Efficient Sorting Algorithm
>
>
>Brian,
>
>> I have two lists (in general of different length) that have the
>> following structure
>
>    s1 = Table[{FromCharacterCode[Table[Random[Integer, {65, 69}], {4}]],
>         Random[Integer, {1, 2000}]}, {100}];
>     s2 = Table[{FromCharacterCode[Table[Random[Integer, {65, 69}], {4}]],
>        Random[Integer, {1, 2000}]}, {200}];
>
>> I am then interested in obtaining  all the strings in s1 that match
>> with those in s2. At the present time I use the following agorithm to
>> find the matches
>[I evaluate and time]
>
>
>    (myList = Flatten[Outer[List, s1, s2, 1], 1];
>    r1 = Select[myList, (First[First[#]] == First[Last[#]]) &]) // Timing
>
>
>
> {2.58 Second, --X--}
>
>I  may be misunderstanding what you are after, but if it is 
>the menbers of
>s1 that have the same first element (string) as a  member of 
>s2 then the
>following seems quite quick
>
>(r2=s1[[Flatten[Position[
>              Transpose[s1][[1]],
>              Alternatives@@(Transpose[s2][[1]])]]]])//Timing
>
>{0.06 Second, --X--}
>
>There is a difference:
>
--X--
>
>This is caused by the occurence of
>    {{"BEAB", 716}, {"BEAB", 1071}},
>    {{"BEAB", 716}, {"BEAB", 1098}},
>    {{"BEAB", 716}, {"BEAB", 1966}}
>
>in r1 compared to
>
>    {"BEAB", 716}
>
>in r2
>
>
>However, we can get the same matching information as from the Outer
>technique with the following quick steps
>
Here I just sewed together your steps to a procedure (for short, with an
obvious renaming): 

myMatch2[s1_, s2_] := Module[{r1, rr1, r2, rr2},
    r1 = s1[[Flatten[
            Position[Transpose[s1][[1]], 
              Alternatives @@ (Transpose[s2][[1]])]]]];
    rr1 = Split[Sort[r1], #1[[1]] === #2[[1]] &];
    r2 = s2[[Flatten[
            Position[Transpose[s2][[1]], 
              Alternatives @@ (Transpose[s1][[1]])]]]];
    rr2 = Split[Sort[r2], #1[[1]] === #2[[1]] &];
    Flatten[MapThread[Outer[List, #1, #2, 1] &, {rr1, rr2}], 2]]


In fact I replaced the last step, which was Transpose[{rr1, rr2}] to what I
had used in myMatch (I'm so sorry for the name!) as to reproduce Brians's
results.

>
>
>
>--
>Allan
>
>---------------------
>Allan Hayes
>Mathematica Training and Consulting
>Leicester UK
>www.haystack.demon.co.uk
>hay at haystack.demon.co.uk
>Voice: +44 (0)116 271 4198
>Fax: +44 (0)870 164 0565
>
>
>"Brian Higgins" <bghiggins at ucdavis.edu> wrote in message
>news:asi1ck$erq$1 at smc.vnet.net...
>> Hi,
>>
>> I have two lists (in general of different length) that have the
>> following structure
>>
>> s1 = Table[{FromCharacterCode[Table[Random[Integer, {65, 69}], {4}]],
>>        Random[Integer, {1, 2000}]}, {100}];
>> s2 = Table[{FromCharacterCode[Table[Random[Integer, {65, 69}], {4}]],
>>        Random[Integer, {1, 2000}]}, {200}];
>>
>> I am then interested in obtaining  all the strings in s1 that match
>> with those in s2. At the present time I use the following agorithm to
>> find the matches
>>
>> myList = Flatten[Outer[List, s1, s2, 1], 1];Select[myList,
>> (First[First[#]] == First[Last[#]]) &]
>>
>> This works fine, but when s1 and s2 are large  ( e.g. 3000 or more
>> elements) then Outer seems inefficient.  My question: does 
>anyone have
>> a suggestion that would be more efficient than my kludge approach.
>> Note I have tried Intersection, which finds all the matches, i.e.
>>
>> myList2 = Intersection[s1,s2, SameTest -> (#1[[1]] == #2[[1]] &)];
>>
>> But I have not been successful in selecting all the matched pairs
>> using myList2
>>
>> Thanks in advance for any suggestions.
>>
>> Brian
>>
>
>
>

That procedure can be forther improved by cutting down the alternatives and
avoiding recalculation:

myMatch3[s1_, s2_] := 
  Module[{ss1 = Sort[s1], k1, r1, rr1, ss2 = Sort[s2], k2, r2, rr2, alt},
    k1 = Transpose[ss1][[1]]; k2 = Transpose[ss2][[1]];
    alt = Alternatives @@ Intersection[k1, k2];
    r1 = ss1[[Flatten[Position[k1, alt]]]];
    rr1 = Split[r1, #1[[1]] === #2[[1]] &];
    r2 = ss2[[Flatten[Position[k2, alt]]]];
    rr2 = Split[r2, #1[[1]] === #2[[1]] &];
    Flatten[MapThread[Outer[List, #1, #2, 1] &, {rr1, rr2}], 2]]


Now the Timings:
 
In[149]:=
s1 = Table[{FromCharacterCode[Table[Random[Integer, {65, 69}], {4}]], 
        Random[Integer, {1, 2000}]}, {250}];
s2 = Table[{FromCharacterCode[Table[Random[Integer, {65, 69}], {4}]], 
        Random[Integer, {1, 2000}]}, {500}];

(I increased the size of the problem a bit)

In[151]:= (myList = Flatten[Outer[List, s1, s2, 1], 1];
      res00 = 
        Select[myList, (First[First[#]] == First[Last[#]]) &]); // Timing
Out[151]=
{13.42 Second, Null}


In[152]:= res0 = myMatch0[s1, s2]; // Timing
Out[152]= {11.716 Second, Null}

(BTW, why this comes out faster, is something I don't understand)

In[153]:= res0 == Sort[res00]
Out[153]= True


In[154]:= res1 = myMatch[s1, s2]; // Timing
Out[154]= {15.703 Second, Null}

(my miserable one)

In[155]:= res1 == res0
Out[155]= True

In[157]:= res2 = myMatch2[s1, s2]; // Timing
Out[157]= {0.24 Second, Null}

(That's yours, quite fine!)

In[158]:= res2 == res0
Out[158]= True

In[160]:= res3 = myMatch3[s1, s2]; // Timing
Out[160]= {0.131 Second, Null}

(your algorithm, improved)

In[161]:= res3 == res0
Out[161]= True


Astonishingly, pretty old procedural programming is competitive:

getLabelled[s_, labels_] := 
  Module[{i = 1, len = Length[s], r}, 
    (r = {};
     While[i <= len && Order[s[[i, 1]], #] > 0, ++i];   
     While[i <= len && Order[s[[i, 1]], #] == 0, AppendTo[r, s[[i++]]] ];
     r) & /@ labels]

myMatch4[s1_, s2_] := 
  Module[{ss1 = Sort[s1], ss2 = Sort[s2], k1, k2, keys, rr1, rr2}, 
    k1 = Transpose[ss1][[1]]; k2 = Transpose[ss2][[1]];
    keys = Intersection[k1, k2];
    rr1 = getLabelled[ss1, keys];
    rr2 = getLabelled[ss2, keys];
    Flatten[MapThread[Outer[List, #1, #2, 1] &, {rr1, rr2}], 2]]


In[235]:= res4 = myMatch4[s1, s2]; // Timing
Out[235]= {0.25 Second, Null}

In[236]:= res4 == res0
Out[236]= True

(I know, this can be made a bit faster still!)



Finally we also might compare to Sseziwa's solution

In[172]:=
res5 = Join[Cases[s1, {#, _}], Cases[s2, {#, _}]] & /@ 
        Intersection[Union[Transpose[s1][[1]]], 
          Union[Transpose[s2][[1]]]]; // Timing
Out[172]=
{0.621 Second, Null}

Although Union is unnecessary here, it gives a slight performance
improvement (another riddle of Intersection -- to me; I might guess WRI
could do something there).

res4 does not correlate the pairs, so we just look for occurances:

In[180]:= Union[Flatten[res0, 1]] == Union[Flatten[res5, 1]]
Out[180]= True


--
Hartmut
Prev by Date: Re: 1^Infinity
Next by Date: Re: Re: Efficient Sorting Algorithm[3]
Previous by thread: Re: Efficient Sorting Algorithm
Next by thread: Re: Efficient Sorting Algorithm