Re: Efficient Sorting Algorithm [2]
- To: mathgroup at smc.vnet.net
- Subject: [mg38209] Re: Efficient Sorting Algorithm [2]
- From: "News Admin" <news at news.demon.net>
- Date: Thu, 5 Dec 2002 03:30:58 -0500 (EST)
- References: <asi1ck$erq$1@smc.vnet.net> <askf45$kv0$1@smc.vnet.net>
- Sender: owner-wri-mathgroup at wolfram.com
Brian, Here is two slight refinements of my previous posting TEST LISTS s1 = Table[{FromCharacterCode[Table[Random[Integer, {65, 69}], {4}]], Random[Integer, {1, 2000}]}, {2000}]; s2 = Table[{FromCharacterCode[Table[Random[Integer, {65, 69}], {4}]], Random[Integer, {1, 2000}]}, {4000}]; MY PREVIOUS CODE (r2=s1[[Flatten[Position[ Transpose[s1][[1]], Alternatives@@(Transpose[s2][[1]])]]]]);//Timing {1.97 Second,Null} TWO REFINEMENTS (r2a=s1[[Flatten[Position[ s1[[All,1]], Alternatives@@s2[[All,1]]]]]]);//Timing {1.92 Second,Null} If there are many repetitions of the first entries in s2 then it is advantageous to use Union. (r2b=s1[[Flatten[Position[ s1[[All,1]], Alternatives@@Union[s2[[All,1]]]]]]]);//Timing {1.04 Second,Null} Check r2===r2a===r2b True RELATING MATCHES If needed, we can relate the matches as follows (r3b=s2[[Flatten[Position[ s2[[All,1]], Alternatives@@Union[s1[[All,1]]]]]]]);//Timing {1.93 Second,Null} (rr2= Split[Sort[r2b], #1[[1]]===#2[[1]]&]);//Timing {0.28 Second,Null} (rr3= Split[Sort[r3b], #1[[1]]===#2[[1]]&]);//Timing {0.49 Second,Null} (rr4=Transpose[{rr2,rr3}]);//Timing {0. Second,Null} Check TableForm[Take[rr4,6]] AAAA 514 AAAA 438 AAAA 815 AAAA 972 AAAA 1613 AAAA 1022 AAAA 1692 AAAA 1337 AAAA 1692 AAAA 1617 AAAB 425 AAAB 35 AAAB 753 AAAB 183 AAAB 814 AAAB 726 AAAB 1596 AAAB 1507 AAAC 109 AAAC 9 AAAC 380 AAAC 122 AAAC 519 AAAC 974 AAAC 710 AAAC 1482 AAAC 1196 AAAC 1758 AAAC 1760 AAAD 436 AAAD 543 AAAD 1459 AAAD 1620 AAAD 1661 AAAE 119 AAAE 151 AAAE 214 AAAE 460 AAAE 1002 AAAE 603 AAAE 1052 AAAE 1012 AAAE 1127 AAAE 1651 AAAE 1177 AAAE 1905 AAAE 1714 AABA 231 AABA 514 AABA 603 AABA 779 AABA 1248 AABA 1504 AABA 637 AABA 1574 AABA 1576 AABA 1962 -- Allan --------------------- Allan Hayes Mathematica Training and Consulting Leicester UK www.haystack.demon.co.uk hay at haystack.demon.co.uk Voice: +44 (0)116 271 4198 Fax: +44 (0)870 164 0565 "Allan Hayes" <hay at haystack.demon.co.uk> wrote in message news:askf45$kv0$1 at smc.vnet.net... > Brian, > > > I have two lists (in general of different length) that have the > > following structure > > s1 = Table[{FromCharacterCode[Table[Random[Integer, {65, 69}], {4}]], > Random[Integer, {1, 2000}]}, {100}]; > s2 = Table[{FromCharacterCode[Table[Random[Integer, {65, 69}], {4}]], > Random[Integer, {1, 2000}]}, {200}]; > > > I am then interested in obtaining all the strings in s1 that match > > with those in s2. At the present time I use the following agorithm to > > find the matches > [I evaluate and time] > > ( myList = Flatten[Outer[List, s1, s2, 1], 1];Select[myList, > (First[First[#]] == First[Last[#]]) &])//Timing > > (myList = Flatten[Outer[List, s1, s2, 1], 1]; > r1 = Select[myList, (First[First[#]] == First[Last[#]]) &]) // Timing > > > > {2.58 Second, {{{"DCCA", 466}, {"DCCA", 1752}}, {{"CBAA", 115}, {"CBAA", > 1267}}, {{"AADC", 1900}, {"AADC", 1256}}, {{"DBAC", 61}, {"DBAC", > 962}}, {{"ADDC", 1967}, {"ADDC", 1794}}, {{"EEDE", 1206}, {"EEDE", > 758}}, {{"ECDC", 1896}, {"ECDC", 1797}}, {{"DDCC", 1866}, {"DDCC", > 1914}}, {{"DEDA", 1213}, {"DEDA", 1166}}, {{"ADAE", 444}, {"ADAE", > 1239}}, {{"DBAA", 249}, {"DBAA", 58}}, {{"ECCB", 760}, {"ECCB", > 639}}, {{"DDAE", 1511}, {"DDAE", 1878}}, {{"EDBD", 814}, {"EDBD", > 565}}, {{"AEBE", 88}, {"AEBE", 608}}, {{"CDBC", 399}, {"CDBC", > 1893}}, {{"DDBB", 670}, {"DDBB", 1528}}, {{"AEDC", 1601}, {"AEDC", > 1207}}, {{"EDCC", 946}, {"EDCC", 773}}, {{"BEAB", 716}, {"BEAB", > 1071}}, {{"BEAB", 716}, {"BEAB", 1098}}, {{"BEAB", 716}, {"BEAB", > 1966}}, {{"ADDC", 1703}, {"ADDC", 1794}}, {{"ACDC", 288}, {"ACDC", > 132}}, {{"DABE", 396}, {"DABE", 230}}, {{"ADEC", 671}, {"ADEC", > 265}}, {{"AABD", 1370}, {"AABD", 1426}}, {{"ACEB", 1088}, {"ACEB", > 1979}}, {{"BCDA", 552}, {"BCDA", 910}}, {{"BCAD", 98}, {"BCAD", > 102}}, {{"DCBD", 1576}, {"DCBD", 253}}}} > > I may be misunderstanding what you are after, but if it is the menbers of > s1 that have the same first element (string) as a member of s2 then the > following seems quite quick > > (r2=s1[[Flatten[Position[ > Transpose[s1][[1]], > Alternatives@@(Transpose[s2][[1]])]]]])//Timing > > {0.06 Second,{{DCCA,466},{CBAA,115},{AADC,1900},{DBAC,61},{ADDC,1967},{EEDE, > 1206},{ECDC,1896},{DDCC,1866},{DEDA,1213},{ADAE,444},{DBAA,249},{ECCB, > 760},{DDAE,1511},{EDBD,814},{AEBE,88},{CDBC,399},{DDBB,670},{AEDC, > 1601},{EDCC,946},{BEAB,716},{ADDC,1703},{ACDC,288},{DABE,396},{ADEC, > 671},{AABD,1370},{ACEB,1088},{BCDA,552},{BCAD,98},{DCBD,1576}}} > > There is a difference: > > Length[r1] > > 31 > > Length[r2] > > 29 > > This is caused by the occurence of > {{"BEAB", 716}, {"BEAB", 1071}}, > {{"BEAB", 716}, {"BEAB", 1098}}, > {{"BEAB", 716}, {"BEAB", 1966}} > > in r1 compared to > > {"BEAB", 716} > > in r2 > > > However, we can get the same matching information as from the Outer > technique with the following quick steps > > (r3=s2[[Flatten[Position[ > Transpose[s2][[1]], > Alternatives@@(Transpose[s1][[1]])]]]]);//Timing > > {0.05 Second,Null} > > (rr2= Split[Sort[r2], #1[[1]]===#2[[1]]&]);//Timing > > {0. Second,Null} > > (rr3= Split[Sort[r3], #1[[1]]===#2[[1]]&]);//Timing > > {0. Second,Null} > > (rr4=Transpose[{rr2,rr3}]);//Timing > > {0. Second,Null} > > TableForm[rr4]//Timing > > {0. Second, AABD 1370 AABD 1426} > > AADC 1900 AADC 1256 > > ACDC 288 ACDC 132 > > ACEB 1088 ACEB 1979 > > ADAE 444 ADAE 1239 > > ADDC 1703 > ADDC 1967 ADDC 1794 > > ADEC 671 ADEC 265 > > AEBE 88 AEBE 608 > > AEDC 1601 AEDC 1207 > > BCAD 98 BCAD 102 > > BCDA 552 BCDA 910 > > BEAB 1071 > BEAB 1098 > BEAB 716 BEAB 1966 > > CBAA 115 CBAA 1267 > > CDBC 399 CDBC 1893 > > DABE 396 DABE 230 > > DBAA 249 DBAA 58 > > DBAC 61 DBAC 962 > > DCBD 1576 DCBD 253 > > DCCA 466 DCCA 1752 > > DDAE 1511 DDAE 1878 > > DDBB 670 DDBB 1528 > > DDCC 1866 DDCC 1914 > > DEDA 1213 DEDA 1166 > > ECCB 760 ECCB 639 > > ECDC 1896 ECDC 1797 > > EDBD 814 EDBD 565 > > EDCC 946 EDCC 773 > > EEDE 1206 EEDE 758 > > > > -- > Allan > > --------------------- > Allan Hayes > Mathematica Training and Consulting > Leicester UK > www.haystack.demon.co.uk > hay at haystack.demon.co.uk > Voice: +44 (0)116 271 4198 > Fax: +44 (0)870 164 0565 > > > "Brian Higgins" <bghiggins at ucdavis.edu> wrote in message > news:asi1ck$erq$1 at smc.vnet.net... > > Hi, > > > > I have two lists (in general of different length) that have the > > following structure > > > > s1 = Table[{FromCharacterCode[Table[Random[Integer, {65, 69}], {4}]], > > Random[Integer, {1, 2000}]}, {100}]; > > s2 = Table[{FromCharacterCode[Table[Random[Integer, {65, 69}], {4}]], > > Random[Integer, {1, 2000}]}, {200}]; > > > > I am then interested in obtaining all the strings in s1 that match > > with those in s2. At the present time I use the following agorithm to > > find the matches > > > > myList = Flatten[Outer[List, s1, s2, 1], 1];Select[myList, > > (First[First[#]] == First[Last[#]]) &] > > > > This works fine, but when s1 and s2 are large ( e.g. 3000 or more > > elements) then Outer seems inefficient. My question: does anyone have > > a suggestion that would be more efficient than my kludge approach. > > Note I have tried Intersection, which finds all the matches, i.e. > > > > myList2 = Intersection[s1,s2, SameTest -> (#1[[1]] == #2[[1]] &)]; > > > > But I have not been successful in selecting all the matched pairs > > using myList2 > > > > Thanks in advance for any suggestions. > > > > Brian > > > > >