MathGroup Archive: December 2002 [00214]

[Date Index] [Thread Index] [Author Index]
Re: Efficient Sorting Algorithm
To: mathgroup at smc.vnet.net
Subject: [mg38245] Re: Efficient Sorting Algorithm
From: "Allan Hayes" <hay at haystack.demon.co.uk>
Date: Tue, 10 Dec 2002 04:09:48 -0500 (EST)
References: <asi1ck$erq$1@smc.vnet.net> <askf45$kv0$1@smc.vnet.net>
Sender: owner-wri-mathgroup at wolfram.com
[I am  re-posting since the subject line did not appear with the original]

Dear Hartmut,

>  But introducing a test in Intersection appears to kill
> its performance).

It may be that introducing a test prevents internal compiling.


One more simplification - and this points up the value of using Alternatives
to reduce the problem to simple pattern matching without conditions or
tests:

    s1=Table[{FromCharacterCode[Table[Random[Integer,{65,69}],{4}]],
        Random[Integer,{1,2000}]},{6000}];
    s2=Table[{FromCharacterCode[Table[Random[Integer,{65,69}],{4}]],
        Random[Integer,{1,2000}]},{12000}];

In my second posting I ended with

        s1[[Flatten[Position[
                s1[[All,1]],
                Alternatives@@Union[s2[[All,1]]]]]]];//Timing

        {3.24 Second,Null}

Using Cases gives a simpler and slightly faster version:

    Cases[
          s1,
          {Alternatives@@Union[s2[[All,1]]],_}];//Timing

        {3.13 Second,Null}

How useful Union or, as below, Intersection are will be, depends of course
on the particular inputs, but for the current ones I get

    Cases[
      s1,
      {Alternatives@@s2[[All,1]],_}];//Timing

        {5.32 Second,Null}

    Cases[
      s1,
      {Alternatives@@Intersection[s1[[All,1]],s2[[All,1]]],_}];//Timing

        {3.41 Second,Null}

In the last one, Cases may be repeating some of the work that Intersection
has done.

Perhaps you can build spme of this in your developments.

Allan

---------------------
Allan Hayes
Mathematica Training and Consulting
Leicester UK
www.haystack.demon.co.uk
hay at haystack.demon.co.uk
Voice: +44 (0)116 271 4198
Fax: +44 (0)870 164 0565


----- Original Message -----
From: "Wolf, Hartmut" <Hartmut.Wolf at t-systems.com>
To: mathgroup at smc.vnet.net
<mukasa at jeol.com>
Subject: [mg38245] RE: [mg38165] Re: Efficient Sorting Algorithm


> Dear Allan,
>
> I was quite astonished about the performance of your proposed algorithm,
as
> I also was about the misperformance of mine (I had not tested, just
believed
> Mathematica will do. But introducing a test in Intersection appears to
kill
> its performance). Nonetheless we might combine ideas to get something
still
> faster.
>
> (To keep all this legible I will snip out in the following part of your
> deduction and examples)
>
>
> >-----Original Message-----
> >From: Allan Hayes [mailto:hay at haystack.demon.co.uk]
To: mathgroup at smc.vnet.net
> >Sent: Wednesday, December 04, 2002 9:25 AM
> >Subject: [mg38245] [mg38165] Re: Efficient Sorting Algorithm
> >
> >
> >Brian,
> >
> >> I have two lists (in general of different length) that have the
> >> following structure
> >
> >    s1 = Table[{FromCharacterCode[Table[Random[Integer, {65, 69}], {4}]],
> >         Random[Integer, {1, 2000}]}, {100}];
> >     s2 = Table[{FromCharacterCode[Table[Random[Integer, {65, 69}],
{4}]],
> >        Random[Integer, {1, 2000}]}, {200}];
> >
> >> I am then interested in obtaining  all the strings in s1 that match
> >> with those in s2. At the present time I use the following agorithm to
> >> find the matches
> >[I evaluate and time]
> >
> >
> >    (myList = Flatten[Outer[List, s1, s2, 1], 1];
> >    r1 = Select[myList, (First[First[#]] == First[Last[#]]) &]) // Timing
> >
> >
> >
> > {2.58 Second, --X--}
> >
> >I  may be misunderstanding what you are after, but if it is
> >the menbers of
> >s1 that have the same first element (string) as a  member of
> >s2 then the
> >following seems quite quick
> >
> >(r2=s1[[Flatten[Position[
> >              Transpose[s1][[1]],
> >              Alternatives@@(Transpose[s2][[1]])]]]])//Timing
> >
> >{0.06 Second, --X--}
> >
> >There is a difference:
> >
> --X--
> >
> >This is caused by the occurence of
> >    {{"BEAB", 716}, {"BEAB", 1071}},
> >    {{"BEAB", 716}, {"BEAB", 1098}},
> >    {{"BEAB", 716}, {"BEAB", 1966}}
> >
> >in r1 compared to
> >
> >    {"BEAB", 716}
> >
> >in r2
> >
> >
> >However, we can get the same matching information as from the Outer
> >technique with the following quick steps
> >
> Here I just sewed together your steps to a procedure (for short, with an
> obvious renaming):
>
> myMatch2[s1_, s2_] := Module[{r1, rr1, r2, rr2},
>     r1 = s1[[Flatten[
>             Position[Transpose[s1][[1]],
>               Alternatives @@ (Transpose[s2][[1]])]]]];
>     rr1 = Split[Sort[r1], #1[[1]] === #2[[1]] &];
>     r2 = s2[[Flatten[
>             Position[Transpose[s2][[1]],
>               Alternatives @@ (Transpose[s1][[1]])]]]];
>     rr2 = Split[Sort[r2], #1[[1]] === #2[[1]] &];
>     Flatten[MapThread[Outer[List, #1, #2, 1] &, {rr1, rr2}], 2]]
>
>
> In fact I replaced the last step, which was Transpose[{rr1, rr2}] to what
I
> had used in myMatch (I'm so sorry for the name!) as to reproduce Brians's
> results.
>
> >
> >
> >
> >--
> >Allan
> >
> >---------------------
> >Allan Hayes
> >Mathematica Training and Consulting
> >Leicester UK
> >www.haystack.demon.co.uk
> >hay at haystack.demon.co.uk
> >Voice: +44 (0)116 271 4198
> >Fax: +44 (0)870 164 0565
> >
> >
> >"Brian Higgins" <bghiggins at ucdavis.edu> wrote in message
> >news:asi1ck$erq$1 at smc.vnet.net...
> >> Hi,
> >>
> >> I have two lists (in general of different length) that have the
> >> following structure
> >>
> >> s1 = Table[{FromCharacterCode[Table[Random[Integer, {65, 69}], {4}]],
> >>        Random[Integer, {1, 2000}]}, {100}];
> >> s2 = Table[{FromCharacterCode[Table[Random[Integer, {65, 69}], {4}]],
> >>        Random[Integer, {1, 2000}]}, {200}];
> >>
> >> I am then interested in obtaining  all the strings in s1 that match
> >> with those in s2. At the present time I use the following agorithm to
> >> find the matches
> >>
> >> myList = Flatten[Outer[List, s1, s2, 1], 1];Select[myList,
> >> (First[First[#]] == First[Last[#]]) &]
> >>
> >> This works fine, but when s1 and s2 are large  ( e.g. 3000 or more
> >> elements) then Outer seems inefficient.  My question: does
> >anyone have
> >> a suggestion that would be more efficient than my kludge approach.
> >> Note I have tried Intersection, which finds all the matches, i.e.
> >>
> >> myList2 = Intersection[s1,s2, SameTest -> (#1[[1]] == #2[[1]] &)];
> >>
> >> But I have not been successful in selecting all the matched pairs
> >> using myList2
> >>
> >> Thanks in advance for any suggestions.
> >>
> >> Brian
> >>
> >
> >
> >
>
> That procedure can be forther improved by cutting down the alternatives
and
> avoiding recalculation:
>
> myMatch3[s1_, s2_] :=
>   Module[{ss1 = Sort[s1], k1, r1, rr1, ss2 = Sort[s2], k2, r2, rr2, alt},
>     k1 = Transpose[ss1][[1]]; k2 = Transpose[ss2][[1]];
>     alt = Alternatives @@ Intersection[k1, k2];
>     r1 = ss1[[Flatten[Position[k1, alt]]]];
>     rr1 = Split[r1, #1[[1]] === #2[[1]] &];
>     r2 = ss2[[Flatten[Position[k2, alt]]]];
>     rr2 = Split[r2, #1[[1]] === #2[[1]] &];
>     Flatten[MapThread[Outer[List, #1, #2, 1] &, {rr1, rr2}], 2]]
>
>
> Now the Timings:
>
> In[149]:=
> s1 = Table[{FromCharacterCode[Table[Random[Integer, {65, 69}], {4}]],
>         Random[Integer, {1, 2000}]}, {250}];
> s2 = Table[{FromCharacterCode[Table[Random[Integer, {65, 69}], {4}]],
>         Random[Integer, {1, 2000}]}, {500}];
>
> (I increased the size of the problem a bit)
>
> In[151]:= (myList = Flatten[Outer[List, s1, s2, 1], 1];
>       res00 =
>         Select[myList, (First[First[#]] == First[Last[#]]) &]); // Timing
> Out[151]=
> {13.42 Second, Null}
>
>
> In[152]:= res0 = myMatch0[s1, s2]; // Timing
> Out[152]= {11.716 Second, Null}
>
> (BTW, why this comes out faster, is something I don't understand)
>
> In[153]:= res0 == Sort[res00]
> Out[153]= True
>
>
> In[154]:= res1 = myMatch[s1, s2]; // Timing
> Out[154]= {15.703 Second, Null}
>
> (my miserable one)
>
> In[155]:= res1 == res0
> Out[155]= True
>
> In[157]:= res2 = myMatch2[s1, s2]; // Timing
> Out[157]= {0.24 Second, Null}
>
> (That's yours, quite fine!)
>
> In[158]:= res2 == res0
> Out[158]= True
>
> In[160]:= res3 = myMatch3[s1, s2]; // Timing
> Out[160]= {0.131 Second, Null}
>
> (your algorithm, improved)
>
> In[161]:= res3 == res0
> Out[161]= True
>
>
> Astonishingly, pretty old procedural programming is competitive:
>
> getLabelled[s_, labels_] :=
>   Module[{i = 1, len = Length[s], r},
>     (r = {};
>      While[i <= len && Order[s[[i, 1]], #] > 0, ++i];
>      While[i <= len && Order[s[[i, 1]], #] == 0, AppendTo[r, s[[i++]]] ];
>      r) & /@ labels]
>
> myMatch4[s1_, s2_] :=
>   Module[{ss1 = Sort[s1], ss2 = Sort[s2], k1, k2, keys, rr1, rr2},
>     k1 = Transpose[ss1][[1]]; k2 = Transpose[ss2][[1]];
>     keys = Intersection[k1, k2];
>     rr1 = getLabelled[ss1, keys];
>     rr2 = getLabelled[ss2, keys];
>     Flatten[MapThread[Outer[List, #1, #2, 1] &, {rr1, rr2}], 2]]
>
>
> In[235]:= res4 = myMatch4[s1, s2]; // Timing
> Out[235]= {0.25 Second, Null}
>
> In[236]:= res4 == res0
> Out[236]= True
>
> (I know, this can be made a bit faster still!)
>
>
>
> Finally we also might compare to Sseziwa's solution
>
> In[172]:=
> res5 = Join[Cases[s1, {#, _}], Cases[s2, {#, _}]] & /@
>         Intersection[Union[Transpose[s1][[1]]],
>           Union[Transpose[s2][[1]]]]; // Timing
> Out[172]=
> {0.621 Second, Null}
>
> Although Union is unnecessary here, it gives a slight performance
> improvement (another riddle of Intersection -- to me; I might guess WRI
> could do something there).
>
> res4 does not correlate the pairs, so we just look for occurances:
>
> In[180]:= Union[Flatten[res0, 1]] == Union[Flatten[res5, 1]]
> Out[180]= True
>
>
> --
> Hartmut
>
>
Prev by Date: RE: surface dependent mesh or a smooth surface?
Next by Date: visual map
Previous by thread: Re: Efficient Sorting Algorithm
Next by thread: Re: Efficient Sorting Algorithm