MathGroup Archive: December 2002 [00110]

[Date Index] [Thread Index] [Author Index]
Re: Efficient Sorting Algorithm
To: mathgroup at smc.vnet.net
Subject: [mg38165] Re: Efficient Sorting Algorithm
From: "Allan Hayes" <hay at haystack.demon.co.uk>
Date: Wed, 4 Dec 2002 03:24:38 -0500 (EST)
References: <asi1ck$erq$1@smc.vnet.net>
Sender: owner-wri-mathgroup at wolfram.com
Brian,

> I have two lists (in general of different length) that have the
> following structure

    s1 = Table[{FromCharacterCode[Table[Random[Integer, {65, 69}], {4}]],
         Random[Integer, {1, 2000}]}, {100}];
     s2 = Table[{FromCharacterCode[Table[Random[Integer, {65, 69}], {4}]],
        Random[Integer, {1, 2000}]}, {200}];

> I am then interested in obtaining  all the strings in s1 that match
> with those in s2. At the present time I use the following agorithm to
> find the matches
[I evaluate and time]

    ( myList = Flatten[Outer[List, s1, s2, 1], 1];Select[myList,
    (First[First[#]] == First[Last[#]]) &])//Timing

    (myList = Flatten[Outer[List, s1, s2, 1], 1];
    r1 = Select[myList, (First[First[#]] == First[Last[#]]) &]) // Timing



 {2.58 Second, {{{"DCCA", 466}, {"DCCA", 1752}}, {{"CBAA", 115}, {"CBAA",
        1267}}, {{"AADC", 1900}, {"AADC", 1256}}, {{"DBAC", 61}, {"DBAC",
        962}}, {{"ADDC", 1967}, {"ADDC", 1794}}, {{"EEDE", 1206}, {"EEDE",
        758}}, {{"ECDC", 1896}, {"ECDC", 1797}}, {{"DDCC", 1866}, {"DDCC",
        1914}}, {{"DEDA", 1213}, {"DEDA", 1166}}, {{"ADAE", 444}, {"ADAE",
        1239}}, {{"DBAA", 249}, {"DBAA", 58}}, {{"ECCB", 760}, {"ECCB",
        639}}, {{"DDAE", 1511}, {"DDAE", 1878}}, {{"EDBD", 814}, {"EDBD",
        565}}, {{"AEBE", 88}, {"AEBE", 608}}, {{"CDBC", 399}, {"CDBC",
        1893}}, {{"DDBB", 670}, {"DDBB", 1528}}, {{"AEDC", 1601}, {"AEDC",
        1207}}, {{"EDCC", 946}, {"EDCC", 773}}, {{"BEAB", 716}, {"BEAB",
        1071}}, {{"BEAB", 716}, {"BEAB", 1098}}, {{"BEAB", 716}, {"BEAB",
        1966}}, {{"ADDC", 1703}, {"ADDC", 1794}}, {{"ACDC", 288}, {"ACDC",
        132}}, {{"DABE", 396}, {"DABE", 230}}, {{"ADEC", 671}, {"ADEC",
        265}}, {{"AABD", 1370}, {"AABD", 1426}}, {{"ACEB", 1088}, {"ACEB",
        1979}}, {{"BCDA", 552}, {"BCDA", 910}}, {{"BCAD", 98}, {"BCAD",
        102}}, {{"DCBD", 1576}, {"DCBD", 253}}}}

I  may be misunderstanding what you are after, but if it is the menbers of
s1 that have the same first element (string) as a  member of s2 then the
following seems quite quick

(r2=s1[[Flatten[Position[
              Transpose[s1][[1]],
              Alternatives@@(Transpose[s2][[1]])]]]])//Timing

{0.06 Second,{{DCCA,466},{CBAA,115},{AADC,1900},{DBAC,61},{ADDC,1967},{EEDE,
      1206},{ECDC,1896},{DDCC,1866},{DEDA,1213},{ADAE,444},{DBAA,249},{ECCB,
      760},{DDAE,1511},{EDBD,814},{AEBE,88},{CDBC,399},{DDBB,670},{AEDC,
      1601},{EDCC,946},{BEAB,716},{ADDC,1703},{ACDC,288},{DABE,396},{ADEC,
      671},{AABD,1370},{ACEB,1088},{BCDA,552},{BCAD,98},{DCBD,1576}}}

There is a difference:

    Length[r1]

        31

    Length[r2]

        29

This is caused by the occurence of
    {{"BEAB", 716}, {"BEAB", 1071}},
    {{"BEAB", 716}, {"BEAB", 1098}},
    {{"BEAB", 716}, {"BEAB", 1966}}

in r1 compared to

    {"BEAB", 716}

in r2


However, we can get the same matching information as from the Outer
technique with the following quick steps

    (r3=s2[[Flatten[Position[
                Transpose[s2][[1]],
                Alternatives@@(Transpose[s1][[1]])]]]]);//Timing

        {0.05 Second,Null}

    (rr2= Split[Sort[r2], #1[[1]]===#2[[1]]&]);//Timing

        {0. Second,Null}

    (rr3= Split[Sort[r3], #1[[1]]===#2[[1]]&]);//Timing

        {0. Second,Null}

    (rr4=Transpose[{rr2,rr3}]);//Timing

        {0. Second,Null}

    TableForm[rr4]//Timing

        {0. Second, AABD 1370   AABD 1426}

            AADC 1900   AADC 1256

            ACDC 288    ACDC 132

            ACEB 1088   ACEB 1979

            ADAE 444    ADAE 1239

            ADDC 1703
            ADDC 1967   ADDC 1794

            ADEC 671    ADEC 265

            AEBE 88     AEBE 608

            AEDC 1601   AEDC 1207

            BCAD 98     BCAD 102

            BCDA 552    BCDA 910

                               BEAB 1071
                               BEAB 1098
            BEAB 716    BEAB 1966

            CBAA 115    CBAA 1267

            CDBC 399    CDBC 1893

            DABE 396    DABE 230

            DBAA 249    DBAA 58

            DBAC 61     DBAC 962

            DCBD 1576   DCBD 253

            DCCA 466    DCCA 1752

            DDAE 1511   DDAE 1878

            DDBB 670    DDBB 1528

            DDCC 1866   DDCC 1914

            DEDA 1213   DEDA 1166

            ECCB 760    ECCB 639

            ECDC 1896   ECDC 1797

            EDBD 814    EDBD 565

            EDCC 946    EDCC 773

            EEDE 1206   EEDE 758



--
Allan

---------------------
Allan Hayes
Mathematica Training and Consulting
Leicester UK
www.haystack.demon.co.uk
hay at haystack.demon.co.uk
Voice: +44 (0)116 271 4198
Fax: +44 (0)870 164 0565


"Brian Higgins" <bghiggins at ucdavis.edu> wrote in message
news:asi1ck$erq$1 at smc.vnet.net...
> Hi,
>
> I have two lists (in general of different length) that have the
> following structure
>
> s1 = Table[{FromCharacterCode[Table[Random[Integer, {65, 69}], {4}]],
>        Random[Integer, {1, 2000}]}, {100}];
> s2 = Table[{FromCharacterCode[Table[Random[Integer, {65, 69}], {4}]],
>        Random[Integer, {1, 2000}]}, {200}];
>
> I am then interested in obtaining  all the strings in s1 that match
> with those in s2. At the present time I use the following agorithm to
> find the matches
>
> myList = Flatten[Outer[List, s1, s2, 1], 1];Select[myList,
> (First[First[#]] == First[Last[#]]) &]
>
> This works fine, but when s1 and s2 are large  ( e.g. 3000 or more
> elements) then Outer seems inefficient.  My question: does anyone have
> a suggestion that would be more efficient than my kludge approach.
> Note I have tried Intersection, which finds all the matches, i.e.
>
> myList2 = Intersection[s1,s2, SameTest -> (#1[[1]] == #2[[1]] &)];
>
> But I have not been successful in selecting all the matched pairs
> using myList2
>
> Thanks in advance for any suggestions.
>
> Brian
>
Prev by Date: RE: Re: solving non-algebraic exponential equations
Next by Date: RE: 3D Electric Field Approximation
Previous by thread: Re: Efficient Sorting Algorithm
Next by thread: RE: Re: Efficient Sorting Algorithm