Re: Re: Efficient Sorting Algorithm[3]
- To: mathgroup at smc.vnet.net
- Subject: [mg38229] Re: [mg38165] Re: Efficient Sorting Algorithm[3]
- From: "Allan Hayes" <hay at haystack.demon.co.uk>
- Date: Thu, 5 Dec 2002 03:36:19 -0500 (EST)
- References: <8EB1C3634596D6118952006008F711CD5BCC4E@debis.com>
- Sender: owner-wri-mathgroup at wolfram.com
Dear Hartmut, > But introducing a test in Intersection appears to kill > its performance). It may be that introducing a test prevents internal compiling. One more simplification - and this points up the value of using Alternatives to reduce the problem to simple pattern matching without conditions or tests: s1=Table[{FromCharacterCode[Table[Random[Integer,{65,69}],{4}]], Random[Integer,{1,2000}]},{6000}]; s2=Table[{FromCharacterCode[Table[Random[Integer,{65,69}],{4}]], Random[Integer,{1,2000}]},{12000}]; In my second posting I ended with s1[[Flatten[Position[ s1[[All,1]], Alternatives@@Union[s2[[All,1]]]]]]];//Timing {3.24 Second,Null} Using Cases gives a simpler and slightly faster version: Cases[ s1, {Alternatives@@Union[s2[[All,1]]],_}];//Timing {3.13 Second,Null} How useful Union or, as below, Intersection are will be, depends of course on the particular inputs, but for the current ones I get Cases[ s1, {Alternatives@@s2[[All,1]],_}];//Timing {5.32 Second,Null} Cases[ s1, {Alternatives@@Intersection[s1[[All,1]],s2[[All,1]]],_}];//Timing {3.41 Second,Null} In the last one, Cases may be repeating some of the work that Intersection has done. Perhaps you can build spme of this in your developments. Allan --------------------- Allan Hayes Mathematica Training and Consulting Leicester UK www.haystack.demon.co.uk hay at haystack.demon.co.uk Voice: +44 (0)116 271 4198 Fax: +44 (0)870 164 0565 ----- Original Message ----- From: "Wolf, Hartmut" <Hartmut.Wolf at t-systems.com> To: mathgroup at smc.vnet.net <mukasa at jeol.com> Subject: [mg38229] RE: [mg38165] Re: Efficient Sorting Algorithm > Dear Allan, > > I was quite astonished about the performance of your proposed algorithm, as > I also was about the misperformance of mine (I had not tested, just believed > Mathematica will do. But introducing a test in Intersection appears to kill > its performance). Nonetheless we might combine ideas to get something still > faster. > > (To keep all this legible I will snip out in the following part of your > deduction and examples) > > > >-----Original Message----- > >From: Allan Hayes [mailto:hay at haystack.demon.co.uk] To: mathgroup at smc.vnet.net > >Sent: Wednesday, December 04, 2002 9:25 AM > >To: mathgroup at smc.vnet.net > >Subject: [mg38229] [mg38165] Re: Efficient Sorting Algorithm > > > > > >Brian, > > > >> I have two lists (in general of different length) that have the > >> following structure > > > > s1 = Table[{FromCharacterCode[Table[Random[Integer, {65, 69}], {4}]], > > Random[Integer, {1, 2000}]}, {100}]; > > s2 = Table[{FromCharacterCode[Table[Random[Integer, {65, 69}], {4}]], > > Random[Integer, {1, 2000}]}, {200}]; > > > >> I am then interested in obtaining all the strings in s1 that match > >> with those in s2. At the present time I use the following agorithm to > >> find the matches > >[I evaluate and time] > > > > > > (myList = Flatten[Outer[List, s1, s2, 1], 1]; > > r1 = Select[myList, (First[First[#]] == First[Last[#]]) &]) // Timing > > > > > > > > {2.58 Second, --X--} > > > >I may be misunderstanding what you are after, but if it is > >the menbers of > >s1 that have the same first element (string) as a member of > >s2 then the > >following seems quite quick > > > >(r2=s1[[Flatten[Position[ > > Transpose[s1][[1]], > > Alternatives@@(Transpose[s2][[1]])]]]])//Timing > > > >{0.06 Second, --X--} > > > >There is a difference: > > > --X-- > > > >This is caused by the occurence of > > {{"BEAB", 716}, {"BEAB", 1071}}, > > {{"BEAB", 716}, {"BEAB", 1098}}, > > {{"BEAB", 716}, {"BEAB", 1966}} > > > >in r1 compared to > > > > {"BEAB", 716} > > > >in r2 > > > > > >However, we can get the same matching information as from the Outer > >technique with the following quick steps > > > Here I just sewed together your steps to a procedure (for short, with an > obvious renaming): > > myMatch2[s1_, s2_] := Module[{r1, rr1, r2, rr2}, > r1 = s1[[Flatten[ > Position[Transpose[s1][[1]], > Alternatives @@ (Transpose[s2][[1]])]]]]; > rr1 = Split[Sort[r1], #1[[1]] === #2[[1]] &]; > r2 = s2[[Flatten[ > Position[Transpose[s2][[1]], > Alternatives @@ (Transpose[s1][[1]])]]]]; > rr2 = Split[Sort[r2], #1[[1]] === #2[[1]] &]; > Flatten[MapThread[Outer[List, #1, #2, 1] &, {rr1, rr2}], 2]] > > > In fact I replaced the last step, which was Transpose[{rr1, rr2}] to what I > had used in myMatch (I'm so sorry for the name!) as to reproduce Brians's > results. > > > > > > > > >-- > >Allan > > > >--------------------- > >Allan Hayes > >Mathematica Training and Consulting > >Leicester UK > >www.haystack.demon.co.uk > >hay at haystack.demon.co.uk > >Voice: +44 (0)116 271 4198 > >Fax: +44 (0)870 164 0565 > > > > > >"Brian Higgins" <bghiggins at ucdavis.edu> wrote in message > >news:asi1ck$erq$1 at smc.vnet.net... > >> Hi, > >> > >> I have two lists (in general of different length) that have the > >> following structure > >> > >> s1 = Table[{FromCharacterCode[Table[Random[Integer, {65, 69}], {4}]], > >> Random[Integer, {1, 2000}]}, {100}]; > >> s2 = Table[{FromCharacterCode[Table[Random[Integer, {65, 69}], {4}]], > >> Random[Integer, {1, 2000}]}, {200}]; > >> > >> I am then interested in obtaining all the strings in s1 that match > >> with those in s2. At the present time I use the following agorithm to > >> find the matches > >> > >> myList = Flatten[Outer[List, s1, s2, 1], 1];Select[myList, > >> (First[First[#]] == First[Last[#]]) &] > >> > >> This works fine, but when s1 and s2 are large ( e.g. 3000 or more > >> elements) then Outer seems inefficient. My question: does > >anyone have > >> a suggestion that would be more efficient than my kludge approach. > >> Note I have tried Intersection, which finds all the matches, i.e. > >> > >> myList2 = Intersection[s1,s2, SameTest -> (#1[[1]] == #2[[1]] &)]; > >> > >> But I have not been successful in selecting all the matched pairs > >> using myList2 > >> > >> Thanks in advance for any suggestions. > >> > >> Brian > >> > > > > > > > > That procedure can be forther improved by cutting down the alternatives and > avoiding recalculation: > > myMatch3[s1_, s2_] := > Module[{ss1 = Sort[s1], k1, r1, rr1, ss2 = Sort[s2], k2, r2, rr2, alt}, > k1 = Transpose[ss1][[1]]; k2 = Transpose[ss2][[1]]; > alt = Alternatives @@ Intersection[k1, k2]; > r1 = ss1[[Flatten[Position[k1, alt]]]]; > rr1 = Split[r1, #1[[1]] === #2[[1]] &]; > r2 = ss2[[Flatten[Position[k2, alt]]]]; > rr2 = Split[r2, #1[[1]] === #2[[1]] &]; > Flatten[MapThread[Outer[List, #1, #2, 1] &, {rr1, rr2}], 2]] > > > Now the Timings: > > In[149]:= > s1 = Table[{FromCharacterCode[Table[Random[Integer, {65, 69}], {4}]], > Random[Integer, {1, 2000}]}, {250}]; > s2 = Table[{FromCharacterCode[Table[Random[Integer, {65, 69}], {4}]], > Random[Integer, {1, 2000}]}, {500}]; > > (I increased the size of the problem a bit) > > In[151]:= (myList = Flatten[Outer[List, s1, s2, 1], 1]; > res00 = > Select[myList, (First[First[#]] == First[Last[#]]) &]); // Timing > Out[151]= > {13.42 Second, Null} > > > In[152]:= res0 = myMatch0[s1, s2]; // Timing > Out[152]= {11.716 Second, Null} > > (BTW, why this comes out faster, is something I don't understand) > > In[153]:= res0 == Sort[res00] > Out[153]= True > > > In[154]:= res1 = myMatch[s1, s2]; // Timing > Out[154]= {15.703 Second, Null} > > (my miserable one) > > In[155]:= res1 == res0 > Out[155]= True > > In[157]:= res2 = myMatch2[s1, s2]; // Timing > Out[157]= {0.24 Second, Null} > > (That's yours, quite fine!) > > In[158]:= res2 == res0 > Out[158]= True > > In[160]:= res3 = myMatch3[s1, s2]; // Timing > Out[160]= {0.131 Second, Null} > > (your algorithm, improved) > > In[161]:= res3 == res0 > Out[161]= True > > > Astonishingly, pretty old procedural programming is competitive: > > getLabelled[s_, labels_] := > Module[{i = 1, len = Length[s], r}, > (r = {}; > While[i <= len && Order[s[[i, 1]], #] > 0, ++i]; > While[i <= len && Order[s[[i, 1]], #] == 0, AppendTo[r, s[[i++]]] ]; > r) & /@ labels] > > myMatch4[s1_, s2_] := > Module[{ss1 = Sort[s1], ss2 = Sort[s2], k1, k2, keys, rr1, rr2}, > k1 = Transpose[ss1][[1]]; k2 = Transpose[ss2][[1]]; > keys = Intersection[k1, k2]; > rr1 = getLabelled[ss1, keys]; > rr2 = getLabelled[ss2, keys]; > Flatten[MapThread[Outer[List, #1, #2, 1] &, {rr1, rr2}], 2]] > > > In[235]:= res4 = myMatch4[s1, s2]; // Timing > Out[235]= {0.25 Second, Null} > > In[236]:= res4 == res0 > Out[236]= True > > (I know, this can be made a bit faster still!) > > > > Finally we also might compare to Sseziwa's solution > > In[172]:= > res5 = Join[Cases[s1, {#, _}], Cases[s2, {#, _}]] & /@ > Intersection[Union[Transpose[s1][[1]]], > Union[Transpose[s2][[1]]]]; // Timing > Out[172]= > {0.621 Second, Null} > > Although Union is unnecessary here, it gives a slight performance > improvement (another riddle of Intersection -- to me; I might guess WRI > could do something there). > > res4 does not correlate the pairs, so we just look for occurances: > > In[180]:= Union[Flatten[res0, 1]] == Union[Flatten[res5, 1]] > Out[180]= True > > > -- > Hartmut > >