MathGroup Archive: December 2002 [00147]

[Date Index] [Thread Index] [Author Index]
Re: Efficient Sorting Algorithm [2]
To: mathgroup at smc.vnet.net
Subject: [mg38209] Re: Efficient Sorting Algorithm [2]
From: "News Admin" <news at news.demon.net>
Date: Thu, 5 Dec 2002 03:30:58 -0500 (EST)
References: <asi1ck$erq$1@smc.vnet.net> <askf45$kv0$1@smc.vnet.net>
Sender: owner-wri-mathgroup at wolfram.com
Brian,
Here is two slight refinements of my previous posting

TEST LISTS

    s1 = Table[{FromCharacterCode[Table[Random[Integer, {65, 69}], {4}]],
        Random[Integer, {1, 2000}]}, {2000}];
    s2 = Table[{FromCharacterCode[Table[Random[Integer, {65, 69}], {4}]],
        Random[Integer, {1, 2000}]}, {4000}];

MY PREVIOUS CODE

    (r2=s1[[Flatten[Position[
                Transpose[s1][[1]],
                Alternatives@@(Transpose[s2][[1]])]]]]);//Timing

        {1.97 Second,Null}

TWO REFINEMENTS

    (r2a=s1[[Flatten[Position[
                s1[[All,1]],
                Alternatives@@s2[[All,1]]]]]]);//Timing

        {1.92 Second,Null}

 If there are many repetitions of the first entries in s2 then it is
advantageous to use Union.

    (r2b=s1[[Flatten[Position[
                s1[[All,1]],
                Alternatives@@Union[s2[[All,1]]]]]]]);//Timing

        {1.04 Second,Null}

Check

    r2===r2a===r2b

        True

RELATING MATCHES

If needed, we can relate the matches as follows

    (r3b=s2[[Flatten[Position[
                s2[[All,1]],
                Alternatives@@Union[s1[[All,1]]]]]]]);//Timing

        {1.93 Second,Null}

    (rr2= Split[Sort[r2b], #1[[1]]===#2[[1]]&]);//Timing

        {0.28 Second,Null}

    (rr3= Split[Sort[r3b], #1[[1]]===#2[[1]]&]);//Timing

        {0.49 Second,Null}

    (rr4=Transpose[{rr2,rr3}]);//Timing

        {0. Second,Null}

Check

TableForm[Take[rr4,6]]

AAAA 514    AAAA 438
AAAA 815    AAAA 972
AAAA 1613   AAAA 1022
AAAA 1692   AAAA 1337
AAAA 1692   AAAA 1617

AAAB 425    AAAB 35
AAAB 753    AAAB 183
AAAB 814    AAAB 726
AAAB 1596   AAAB 1507

                    AAAC 109
AAAC 9        AAAC 380
AAAC 122    AAAC 519
AAAC 974    AAAC 710
AAAC 1482   AAAC 1196
AAAC 1758   AAAC 1760

                    AAAD 436
AAAD 543     AAAD 1459
AAAD 1620   AAAD 1661

                    AAAE 119
                    AAAE 151
                    AAAE 214
AAAE 460    AAAE 1002
AAAE 603    AAAE 1052
AAAE 1012   AAAE 1127
AAAE 1651   AAAE 1177
AAAE 1905   AAAE 1714

                    AABA 231
                    AABA 514
                    AABA 603
                    AABA 779
                    AABA 1248
                    AABA 1504
AABA 637    AABA 1574
AABA 1576   AABA 1962

--
Allan

---------------------
Allan Hayes
Mathematica Training and Consulting
Leicester UK
www.haystack.demon.co.uk
hay at haystack.demon.co.uk
Voice: +44 (0)116 271 4198
Fax: +44 (0)870 164 0565


"Allan Hayes" <hay at haystack.demon.co.uk> wrote in message
news:askf45$kv0$1 at smc.vnet.net...
> Brian,
>
> > I have two lists (in general of different length) that have the
> > following structure
>
>     s1 = Table[{FromCharacterCode[Table[Random[Integer, {65, 69}], {4}]],
>          Random[Integer, {1, 2000}]}, {100}];
>      s2 = Table[{FromCharacterCode[Table[Random[Integer, {65, 69}], {4}]],
>         Random[Integer, {1, 2000}]}, {200}];
>
> > I am then interested in obtaining  all the strings in s1 that match
> > with those in s2. At the present time I use the following agorithm to
> > find the matches
> [I evaluate and time]
>
>     ( myList = Flatten[Outer[List, s1, s2, 1], 1];Select[myList,
>     (First[First[#]] == First[Last[#]]) &])//Timing
>
>     (myList = Flatten[Outer[List, s1, s2, 1], 1];
>     r1 = Select[myList, (First[First[#]] == First[Last[#]]) &]) // Timing
>
>
>
>  {2.58 Second, {{{"DCCA", 466}, {"DCCA", 1752}}, {{"CBAA", 115}, {"CBAA",
>         1267}}, {{"AADC", 1900}, {"AADC", 1256}}, {{"DBAC", 61}, {"DBAC",
>         962}}, {{"ADDC", 1967}, {"ADDC", 1794}}, {{"EEDE", 1206}, {"EEDE",
>         758}}, {{"ECDC", 1896}, {"ECDC", 1797}}, {{"DDCC", 1866}, {"DDCC",
>         1914}}, {{"DEDA", 1213}, {"DEDA", 1166}}, {{"ADAE", 444}, {"ADAE",
>         1239}}, {{"DBAA", 249}, {"DBAA", 58}}, {{"ECCB", 760}, {"ECCB",
>         639}}, {{"DDAE", 1511}, {"DDAE", 1878}}, {{"EDBD", 814}, {"EDBD",
>         565}}, {{"AEBE", 88}, {"AEBE", 608}}, {{"CDBC", 399}, {"CDBC",
>         1893}}, {{"DDBB", 670}, {"DDBB", 1528}}, {{"AEDC", 1601}, {"AEDC",
>         1207}}, {{"EDCC", 946}, {"EDCC", 773}}, {{"BEAB", 716}, {"BEAB",
>         1071}}, {{"BEAB", 716}, {"BEAB", 1098}}, {{"BEAB", 716}, {"BEAB",
>         1966}}, {{"ADDC", 1703}, {"ADDC", 1794}}, {{"ACDC", 288}, {"ACDC",
>         132}}, {{"DABE", 396}, {"DABE", 230}}, {{"ADEC", 671}, {"ADEC",
>         265}}, {{"AABD", 1370}, {"AABD", 1426}}, {{"ACEB", 1088}, {"ACEB",
>         1979}}, {{"BCDA", 552}, {"BCDA", 910}}, {{"BCAD", 98}, {"BCAD",
>         102}}, {{"DCBD", 1576}, {"DCBD", 253}}}}
>
> I  may be misunderstanding what you are after, but if it is the menbers of
> s1 that have the same first element (string) as a  member of s2 then the
> following seems quite quick
>
> (r2=s1[[Flatten[Position[
>               Transpose[s1][[1]],
>               Alternatives@@(Transpose[s2][[1]])]]]])//Timing
>
> {0.06
Second,{{DCCA,466},{CBAA,115},{AADC,1900},{DBAC,61},{ADDC,1967},{EEDE,
>
1206},{ECDC,1896},{DDCC,1866},{DEDA,1213},{ADAE,444},{DBAA,249},{ECCB,
>       760},{DDAE,1511},{EDBD,814},{AEBE,88},{CDBC,399},{DDBB,670},{AEDC,
>       1601},{EDCC,946},{BEAB,716},{ADDC,1703},{ACDC,288},{DABE,396},{ADEC,
>       671},{AABD,1370},{ACEB,1088},{BCDA,552},{BCAD,98},{DCBD,1576}}}
>
> There is a difference:
>
>     Length[r1]
>
>         31
>
>     Length[r2]
>
>         29
>
> This is caused by the occurence of
>     {{"BEAB", 716}, {"BEAB", 1071}},
>     {{"BEAB", 716}, {"BEAB", 1098}},
>     {{"BEAB", 716}, {"BEAB", 1966}}
>
> in r1 compared to
>
>     {"BEAB", 716}
>
> in r2
>
>
> However, we can get the same matching information as from the Outer
> technique with the following quick steps
>
>     (r3=s2[[Flatten[Position[
>                 Transpose[s2][[1]],
>                 Alternatives@@(Transpose[s1][[1]])]]]]);//Timing
>
>         {0.05 Second,Null}
>
>     (rr2= Split[Sort[r2], #1[[1]]===#2[[1]]&]);//Timing
>
>         {0. Second,Null}
>
>     (rr3= Split[Sort[r3], #1[[1]]===#2[[1]]&]);//Timing
>
>         {0. Second,Null}
>
>     (rr4=Transpose[{rr2,rr3}]);//Timing
>
>         {0. Second,Null}
>
>     TableForm[rr4]//Timing
>
>         {0. Second, AABD 1370   AABD 1426}
>
>             AADC 1900   AADC 1256
>
>             ACDC 288    ACDC 132
>
>             ACEB 1088   ACEB 1979
>
>             ADAE 444    ADAE 1239
>
>             ADDC 1703
>             ADDC 1967   ADDC 1794
>
>             ADEC 671    ADEC 265
>
>             AEBE 88     AEBE 608
>
>             AEDC 1601   AEDC 1207
>
>             BCAD 98     BCAD 102
>
>             BCDA 552    BCDA 910
>
>                                BEAB 1071
>                                BEAB 1098
>             BEAB 716    BEAB 1966
>
>             CBAA 115    CBAA 1267
>
>             CDBC 399    CDBC 1893
>
>             DABE 396    DABE 230
>
>             DBAA 249    DBAA 58
>
>             DBAC 61     DBAC 962
>
>             DCBD 1576   DCBD 253
>
>             DCCA 466    DCCA 1752
>
>             DDAE 1511   DDAE 1878
>
>             DDBB 670    DDBB 1528
>
>             DDCC 1866   DDCC 1914
>
>             DEDA 1213   DEDA 1166
>
>             ECCB 760    ECCB 639
>
>             ECDC 1896   ECDC 1797
>
>             EDBD 814    EDBD 565
>
>             EDCC 946    EDCC 773
>
>             EEDE 1206   EEDE 758
>
>
>
> --
> Allan
>
> ---------------------
> Allan Hayes
> Mathematica Training and Consulting
> Leicester UK
> www.haystack.demon.co.uk
> hay at haystack.demon.co.uk
> Voice: +44 (0)116 271 4198
> Fax: +44 (0)870 164 0565
>
>
> "Brian Higgins" <bghiggins at ucdavis.edu> wrote in message
> news:asi1ck$erq$1 at smc.vnet.net...
> > Hi,
> >
> > I have two lists (in general of different length) that have the
> > following structure
> >
> > s1 = Table[{FromCharacterCode[Table[Random[Integer, {65, 69}], {4}]],
> >        Random[Integer, {1, 2000}]}, {100}];
> > s2 = Table[{FromCharacterCode[Table[Random[Integer, {65, 69}], {4}]],
> >        Random[Integer, {1, 2000}]}, {200}];
> >
> > I am then interested in obtaining  all the strings in s1 that match
> > with those in s2. At the present time I use the following agorithm to
> > find the matches
> >
> > myList = Flatten[Outer[List, s1, s2, 1], 1];Select[myList,
> > (First[First[#]] == First[Last[#]]) &]
> >
> > This works fine, but when s1 and s2 are large  ( e.g. 3000 or more
> > elements) then Outer seems inefficient.  My question: does anyone have
> > a suggestion that would be more efficient than my kludge approach.
> > Note I have tried Intersection, which finds all the matches, i.e.
> >
> > myList2 = Intersection[s1,s2, SameTest -> (#1[[1]] == #2[[1]] &)];
> >
> > But I have not been successful in selecting all the matched pairs
> > using myList2
> >
> > Thanks in advance for any suggestions.
> >
> > Brian
> >
>
>
>
Prev by Date: Re: function composition
Next by Date: Re: 3D Animations are killing my system
Previous by thread: TriangularSurfacePlot question
Next by thread: Re: Re: Efficient Sorting Algorithm[3]