Re: Efficient Sorting Algorithm
- To: mathgroup at smc.vnet.net
- Subject: [mg38222] Re: [mg38140] Efficient Sorting Algorithm
- From: Daniel Lichtblau <danl at wolfram.com>
- Date: Thu, 5 Dec 2002 03:35:55 -0500 (EST)
- References: <200212030935.EAA14859@smc.vnet.net>
- Sender: owner-wri-mathgroup at wolfram.com
Brian Higgins wrote: > > Hi, > > I have two lists (in general of different length) that have the > following structure > > s1 = Table[{FromCharacterCode[Table[Random[Integer, {65, 69}], {4}]], > Random[Integer, {1, 2000}]}, {100}]; > s2 = Table[{FromCharacterCode[Table[Random[Integer, {65, 69}], {4}]], > Random[Integer, {1, 2000}]}, {200}]; > > I am then interested in obtaining all the strings in s1 that match > with those in s2. At the present time I use the following agorithm to > find the matches > > myList = Flatten[Outer[List, s1, s2, 1], 1];Select[myList, > (First[First[#]] == First[Last[#]]) &] > > This works fine, but when s1 and s2 are large ( e.g. 3000 or more > elements) then Outer seems inefficient. My question: does anyone have > a suggestion that would be more efficient than my kludge approach. > Note I have tried Intersection, which finds all the matches, i.e. > > myList2 = Intersection[s1,s2, SameTest -> (#1[[1]] == #2[[1]] &)]; > > But I have not been successful in selecting all the matched pairs > using myList2 > > Thanks in advance for any suggestions. > > Brian In private e-mail I sent a suggestion along the lines implemented below. Now that I have coded it I realize there are some open questions about what exactly is desired. Specifically, it is not clear to me how you want to handle cases where the same string appears in both lists, and more than once in at least one list. The code below may not do what you want for such cases (it will look for further "pairs" and, if any such are found, lump them separately). Also it sorts for efficiency, hence will not maintain order with respect to either input list (I doubt that is an issue since in any case it is not possible to maintain it with respect to both lists). myTest[l1_,l2_] := Module[{s1,s2,m=Length[l1],n=Length[l2],j,k,res={},ord}, s1 = Sort[l1]; s2 = Sort[l2]; For [j=1; k=1, j<=m && k<=n, Null, ord = Order[s1[[j,1]],s2[[k,1]]]; If [ord==1, j++; Continue[]]; If [ord==-1, k++; Continue[]]; res = {res, {s1[[j]],s2[[k]]}}; j++; k++; ]; Partition[Partition[Flatten[res], 2], 2] ] l1 = Table[{FromCharacterCode[Table[Random[Integer, {65, 90}], {4}]], Random[Integer, {1, 2000}]}, {10^5}]; l2 = Table[{FromCharacterCode[Table[Random[Integer, {65, 90}], {4}]], Random[Integer, {1, 2000}]}, {10^5}]; In[77]:= Timing[res = myTest2[l1,l2];] Out[77]= {5.19 Second, Null} This is on a 1.5 GHz machine. Note that this speed is by and large independent of the size of the result, hence the fact that I increased the alphabet range to A-Z is not an issue. One can probably devise ways to make this faster still by avoiding strings and instead using integers exclusively; in that case one can use Compile and in other ways take advantage of better speed of processing machine integer matrices in Mathematica. Daniel Lichtblau Wolfram Research
- References:
- Efficient Sorting Algorithm
- From: bghiggins@ucdavis.edu (Brian Higgins)
- Efficient Sorting Algorithm