Re: Efficient Sorting Algorithm
- To: mathgroup at smc.vnet.net
- Subject: [mg38165] Re: Efficient Sorting Algorithm
- From: "Allan Hayes" <hay at haystack.demon.co.uk>
- Date: Wed, 4 Dec 2002 03:24:38 -0500 (EST)
- References: <asi1ck$erq$1@smc.vnet.net>
- Sender: owner-wri-mathgroup at wolfram.com
Brian,
> I have two lists (in general of different length) that have the
> following structure
s1 = Table[{FromCharacterCode[Table[Random[Integer, {65, 69}], {4}]],
Random[Integer, {1, 2000}]}, {100}];
s2 = Table[{FromCharacterCode[Table[Random[Integer, {65, 69}], {4}]],
Random[Integer, {1, 2000}]}, {200}];
> I am then interested in obtaining all the strings in s1 that match
> with those in s2. At the present time I use the following agorithm to
> find the matches
[I evaluate and time]
( myList = Flatten[Outer[List, s1, s2, 1], 1];Select[myList,
(First[First[#]] == First[Last[#]]) &])//Timing
(myList = Flatten[Outer[List, s1, s2, 1], 1];
r1 = Select[myList, (First[First[#]] == First[Last[#]]) &]) // Timing
{2.58 Second, {{{"DCCA", 466}, {"DCCA", 1752}}, {{"CBAA", 115}, {"CBAA",
1267}}, {{"AADC", 1900}, {"AADC", 1256}}, {{"DBAC", 61}, {"DBAC",
962}}, {{"ADDC", 1967}, {"ADDC", 1794}}, {{"EEDE", 1206}, {"EEDE",
758}}, {{"ECDC", 1896}, {"ECDC", 1797}}, {{"DDCC", 1866}, {"DDCC",
1914}}, {{"DEDA", 1213}, {"DEDA", 1166}}, {{"ADAE", 444}, {"ADAE",
1239}}, {{"DBAA", 249}, {"DBAA", 58}}, {{"ECCB", 760}, {"ECCB",
639}}, {{"DDAE", 1511}, {"DDAE", 1878}}, {{"EDBD", 814}, {"EDBD",
565}}, {{"AEBE", 88}, {"AEBE", 608}}, {{"CDBC", 399}, {"CDBC",
1893}}, {{"DDBB", 670}, {"DDBB", 1528}}, {{"AEDC", 1601}, {"AEDC",
1207}}, {{"EDCC", 946}, {"EDCC", 773}}, {{"BEAB", 716}, {"BEAB",
1071}}, {{"BEAB", 716}, {"BEAB", 1098}}, {{"BEAB", 716}, {"BEAB",
1966}}, {{"ADDC", 1703}, {"ADDC", 1794}}, {{"ACDC", 288}, {"ACDC",
132}}, {{"DABE", 396}, {"DABE", 230}}, {{"ADEC", 671}, {"ADEC",
265}}, {{"AABD", 1370}, {"AABD", 1426}}, {{"ACEB", 1088}, {"ACEB",
1979}}, {{"BCDA", 552}, {"BCDA", 910}}, {{"BCAD", 98}, {"BCAD",
102}}, {{"DCBD", 1576}, {"DCBD", 253}}}}
I may be misunderstanding what you are after, but if it is the menbers of
s1 that have the same first element (string) as a member of s2 then the
following seems quite quick
(r2=s1[[Flatten[Position[
Transpose[s1][[1]],
Alternatives@@(Transpose[s2][[1]])]]]])//Timing
{0.06 Second,{{DCCA,466},{CBAA,115},{AADC,1900},{DBAC,61},{ADDC,1967},{EEDE,
1206},{ECDC,1896},{DDCC,1866},{DEDA,1213},{ADAE,444},{DBAA,249},{ECCB,
760},{DDAE,1511},{EDBD,814},{AEBE,88},{CDBC,399},{DDBB,670},{AEDC,
1601},{EDCC,946},{BEAB,716},{ADDC,1703},{ACDC,288},{DABE,396},{ADEC,
671},{AABD,1370},{ACEB,1088},{BCDA,552},{BCAD,98},{DCBD,1576}}}
There is a difference:
Length[r1]
31
Length[r2]
29
This is caused by the occurence of
{{"BEAB", 716}, {"BEAB", 1071}},
{{"BEAB", 716}, {"BEAB", 1098}},
{{"BEAB", 716}, {"BEAB", 1966}}
in r1 compared to
{"BEAB", 716}
in r2
However, we can get the same matching information as from the Outer
technique with the following quick steps
(r3=s2[[Flatten[Position[
Transpose[s2][[1]],
Alternatives@@(Transpose[s1][[1]])]]]]);//Timing
{0.05 Second,Null}
(rr2= Split[Sort[r2], #1[[1]]===#2[[1]]&]);//Timing
{0. Second,Null}
(rr3= Split[Sort[r3], #1[[1]]===#2[[1]]&]);//Timing
{0. Second,Null}
(rr4=Transpose[{rr2,rr3}]);//Timing
{0. Second,Null}
TableForm[rr4]//Timing
{0. Second, AABD 1370 AABD 1426}
AADC 1900 AADC 1256
ACDC 288 ACDC 132
ACEB 1088 ACEB 1979
ADAE 444 ADAE 1239
ADDC 1703
ADDC 1967 ADDC 1794
ADEC 671 ADEC 265
AEBE 88 AEBE 608
AEDC 1601 AEDC 1207
BCAD 98 BCAD 102
BCDA 552 BCDA 910
BEAB 1071
BEAB 1098
BEAB 716 BEAB 1966
CBAA 115 CBAA 1267
CDBC 399 CDBC 1893
DABE 396 DABE 230
DBAA 249 DBAA 58
DBAC 61 DBAC 962
DCBD 1576 DCBD 253
DCCA 466 DCCA 1752
DDAE 1511 DDAE 1878
DDBB 670 DDBB 1528
DDCC 1866 DDCC 1914
DEDA 1213 DEDA 1166
ECCB 760 ECCB 639
ECDC 1896 ECDC 1797
EDBD 814 EDBD 565
EDCC 946 EDCC 773
EEDE 1206 EEDE 758
--
Allan
---------------------
Allan Hayes
Mathematica Training and Consulting
Leicester UK
www.haystack.demon.co.uk
hay at haystack.demon.co.uk
Voice: +44 (0)116 271 4198
Fax: +44 (0)870 164 0565
"Brian Higgins" <bghiggins at ucdavis.edu> wrote in message
news:asi1ck$erq$1 at smc.vnet.net...
> Hi,
>
> I have two lists (in general of different length) that have the
> following structure
>
> s1 = Table[{FromCharacterCode[Table[Random[Integer, {65, 69}], {4}]],
> Random[Integer, {1, 2000}]}, {100}];
> s2 = Table[{FromCharacterCode[Table[Random[Integer, {65, 69}], {4}]],
> Random[Integer, {1, 2000}]}, {200}];
>
> I am then interested in obtaining all the strings in s1 that match
> with those in s2. At the present time I use the following agorithm to
> find the matches
>
> myList = Flatten[Outer[List, s1, s2, 1], 1];Select[myList,
> (First[First[#]] == First[Last[#]]) &]
>
> This works fine, but when s1 and s2 are large ( e.g. 3000 or more
> elements) then Outer seems inefficient. My question: does anyone have
> a suggestion that would be more efficient than my kludge approach.
> Note I have tried Intersection, which finds all the matches, i.e.
>
> myList2 = Intersection[s1,s2, SameTest -> (#1[[1]] == #2[[1]] &)];
>
> But I have not been successful in selecting all the matched pairs
> using myList2
>
> Thanks in advance for any suggestions.
>
> Brian
>