RE: Re: Efficient Sorting Algorithm
- To: mathgroup at smc.vnet.net
- Subject: [mg38217] RE: [mg38165] Re: Efficient Sorting Algorithm
- From: "Wolf, Hartmut" <Hartmut.Wolf at t-systems.com>
- Date: Thu, 5 Dec 2002 03:31:54 -0500 (EST)
- Sender: owner-wri-mathgroup at wolfram.com
Dear Allan, I was quite astonished about the performance of your proposed algorithm, as I also was about the misperformance of mine (I had not tested, just believed Mathematica will do. But introducing a test in Intersection appears to kill its performance). Nonetheless we might combine ideas to get something still faster. (To keep all this legible I will snip out in the following part of your deduction and examples) >-----Original Message----- >From: Allan Hayes [mailto:hay at haystack.demon.co.uk] To: mathgroup at smc.vnet.net >Sent: Wednesday, December 04, 2002 9:25 AM >To: mathgroup at smc.vnet.net >Subject: [mg38217] [mg38165] Re: Efficient Sorting Algorithm > > >Brian, > >> I have two lists (in general of different length) that have the >> following structure > > s1 = Table[{FromCharacterCode[Table[Random[Integer, {65, 69}], {4}]], > Random[Integer, {1, 2000}]}, {100}]; > s2 = Table[{FromCharacterCode[Table[Random[Integer, {65, 69}], {4}]], > Random[Integer, {1, 2000}]}, {200}]; > >> I am then interested in obtaining all the strings in s1 that match >> with those in s2. At the present time I use the following agorithm to >> find the matches >[I evaluate and time] > > > (myList = Flatten[Outer[List, s1, s2, 1], 1]; > r1 = Select[myList, (First[First[#]] == First[Last[#]]) &]) // Timing > > > > {2.58 Second, --X--} > >I may be misunderstanding what you are after, but if it is >the menbers of >s1 that have the same first element (string) as a member of >s2 then the >following seems quite quick > >(r2=s1[[Flatten[Position[ > Transpose[s1][[1]], > Alternatives@@(Transpose[s2][[1]])]]]])//Timing > >{0.06 Second, --X--} > >There is a difference: > --X-- > >This is caused by the occurence of > {{"BEAB", 716}, {"BEAB", 1071}}, > {{"BEAB", 716}, {"BEAB", 1098}}, > {{"BEAB", 716}, {"BEAB", 1966}} > >in r1 compared to > > {"BEAB", 716} > >in r2 > > >However, we can get the same matching information as from the Outer >technique with the following quick steps > Here I just sewed together your steps to a procedure (for short, with an obvious renaming): myMatch2[s1_, s2_] := Module[{r1, rr1, r2, rr2}, r1 = s1[[Flatten[ Position[Transpose[s1][[1]], Alternatives @@ (Transpose[s2][[1]])]]]]; rr1 = Split[Sort[r1], #1[[1]] === #2[[1]] &]; r2 = s2[[Flatten[ Position[Transpose[s2][[1]], Alternatives @@ (Transpose[s1][[1]])]]]]; rr2 = Split[Sort[r2], #1[[1]] === #2[[1]] &]; Flatten[MapThread[Outer[List, #1, #2, 1] &, {rr1, rr2}], 2]] In fact I replaced the last step, which was Transpose[{rr1, rr2}] to what I had used in myMatch (I'm so sorry for the name!) as to reproduce Brians's results. > > > >-- >Allan > >--------------------- >Allan Hayes >Mathematica Training and Consulting >Leicester UK >www.haystack.demon.co.uk >hay at haystack.demon.co.uk >Voice: +44 (0)116 271 4198 >Fax: +44 (0)870 164 0565 > > >"Brian Higgins" <bghiggins at ucdavis.edu> wrote in message >news:asi1ck$erq$1 at smc.vnet.net... >> Hi, >> >> I have two lists (in general of different length) that have the >> following structure >> >> s1 = Table[{FromCharacterCode[Table[Random[Integer, {65, 69}], {4}]], >> Random[Integer, {1, 2000}]}, {100}]; >> s2 = Table[{FromCharacterCode[Table[Random[Integer, {65, 69}], {4}]], >> Random[Integer, {1, 2000}]}, {200}]; >> >> I am then interested in obtaining all the strings in s1 that match >> with those in s2. At the present time I use the following agorithm to >> find the matches >> >> myList = Flatten[Outer[List, s1, s2, 1], 1];Select[myList, >> (First[First[#]] == First[Last[#]]) &] >> >> This works fine, but when s1 and s2 are large ( e.g. 3000 or more >> elements) then Outer seems inefficient. My question: does >anyone have >> a suggestion that would be more efficient than my kludge approach. >> Note I have tried Intersection, which finds all the matches, i.e. >> >> myList2 = Intersection[s1,s2, SameTest -> (#1[[1]] == #2[[1]] &)]; >> >> But I have not been successful in selecting all the matched pairs >> using myList2 >> >> Thanks in advance for any suggestions. >> >> Brian >> > > > That procedure can be forther improved by cutting down the alternatives and avoiding recalculation: myMatch3[s1_, s2_] := Module[{ss1 = Sort[s1], k1, r1, rr1, ss2 = Sort[s2], k2, r2, rr2, alt}, k1 = Transpose[ss1][[1]]; k2 = Transpose[ss2][[1]]; alt = Alternatives @@ Intersection[k1, k2]; r1 = ss1[[Flatten[Position[k1, alt]]]]; rr1 = Split[r1, #1[[1]] === #2[[1]] &]; r2 = ss2[[Flatten[Position[k2, alt]]]]; rr2 = Split[r2, #1[[1]] === #2[[1]] &]; Flatten[MapThread[Outer[List, #1, #2, 1] &, {rr1, rr2}], 2]] Now the Timings: In[149]:= s1 = Table[{FromCharacterCode[Table[Random[Integer, {65, 69}], {4}]], Random[Integer, {1, 2000}]}, {250}]; s2 = Table[{FromCharacterCode[Table[Random[Integer, {65, 69}], {4}]], Random[Integer, {1, 2000}]}, {500}]; (I increased the size of the problem a bit) In[151]:= (myList = Flatten[Outer[List, s1, s2, 1], 1]; res00 = Select[myList, (First[First[#]] == First[Last[#]]) &]); // Timing Out[151]= {13.42 Second, Null} In[152]:= res0 = myMatch0[s1, s2]; // Timing Out[152]= {11.716 Second, Null} (BTW, why this comes out faster, is something I don't understand) In[153]:= res0 == Sort[res00] Out[153]= True In[154]:= res1 = myMatch[s1, s2]; // Timing Out[154]= {15.703 Second, Null} (my miserable one) In[155]:= res1 == res0 Out[155]= True In[157]:= res2 = myMatch2[s1, s2]; // Timing Out[157]= {0.24 Second, Null} (That's yours, quite fine!) In[158]:= res2 == res0 Out[158]= True In[160]:= res3 = myMatch3[s1, s2]; // Timing Out[160]= {0.131 Second, Null} (your algorithm, improved) In[161]:= res3 == res0 Out[161]= True Astonishingly, pretty old procedural programming is competitive: getLabelled[s_, labels_] := Module[{i = 1, len = Length[s], r}, (r = {}; While[i <= len && Order[s[[i, 1]], #] > 0, ++i]; While[i <= len && Order[s[[i, 1]], #] == 0, AppendTo[r, s[[i++]]] ]; r) & /@ labels] myMatch4[s1_, s2_] := Module[{ss1 = Sort[s1], ss2 = Sort[s2], k1, k2, keys, rr1, rr2}, k1 = Transpose[ss1][[1]]; k2 = Transpose[ss2][[1]]; keys = Intersection[k1, k2]; rr1 = getLabelled[ss1, keys]; rr2 = getLabelled[ss2, keys]; Flatten[MapThread[Outer[List, #1, #2, 1] &, {rr1, rr2}], 2]] In[235]:= res4 = myMatch4[s1, s2]; // Timing Out[235]= {0.25 Second, Null} In[236]:= res4 == res0 Out[236]= True (I know, this can be made a bit faster still!) Finally we also might compare to Sseziwa's solution In[172]:= res5 = Join[Cases[s1, {#, _}], Cases[s2, {#, _}]] & /@ Intersection[Union[Transpose[s1][[1]]], Union[Transpose[s2][[1]]]]; // Timing Out[172]= {0.621 Second, Null} Although Union is unnecessary here, it gives a slight performance improvement (another riddle of Intersection -- to me; I might guess WRI could do something there). res4 does not correlate the pairs, so we just look for occurances: In[180]:= Union[Flatten[res0, 1]] == Union[Flatten[res5, 1]] Out[180]= True -- Hartmut