MathGroup Archive 2002

[Date Index] [Thread Index] [Author Index]

Search the Archive

Re: Efficient Sorting Algorithm

  • To: mathgroup at smc.vnet.net
  • Subject: [mg38222] Re: [mg38140] Efficient Sorting Algorithm
  • From: Daniel Lichtblau <danl at wolfram.com>
  • Date: Thu, 5 Dec 2002 03:35:55 -0500 (EST)
  • References: <200212030935.EAA14859@smc.vnet.net>
  • Sender: owner-wri-mathgroup at wolfram.com

Brian Higgins wrote:
> 
> Hi,
> 
> I have two lists (in general of different length) that have the
> following structure
> 
> s1 = Table[{FromCharacterCode[Table[Random[Integer, {65, 69}], {4}]],
>        Random[Integer, {1, 2000}]}, {100}];
> s2 = Table[{FromCharacterCode[Table[Random[Integer, {65, 69}], {4}]],
>        Random[Integer, {1, 2000}]}, {200}];
> 
> I am then interested in obtaining  all the strings in s1 that match
> with those in s2. At the present time I use the following agorithm to
> find the matches
> 
> myList = Flatten[Outer[List, s1, s2, 1], 1];Select[myList,
> (First[First[#]] == First[Last[#]]) &]
> 
> This works fine, but when s1 and s2 are large  ( e.g. 3000 or more
> elements) then Outer seems inefficient.  My question: does anyone have
> a suggestion that would be more efficient than my kludge approach.
> Note I have tried Intersection, which finds all the matches, i.e.
> 
> myList2 = Intersection[s1,s2, SameTest -> (#1[[1]] == #2[[1]] &)];
> 
> But I have not been successful in selecting all the matched pairs
> using myList2
> 
> Thanks in advance for any suggestions.
> 
> Brian

In private e-mail I sent a suggestion along the lines implemented below.
Now that I have coded it I realize there are some open questions about
what exactly is desired. Specifically, it is not clear to me how you
want to handle cases where the same string appears in both lists, and
more than once in at least one list. The code below may not do what you
want for such cases (it will look for further "pairs" and, if any such
are found, lump them separately). Also it sorts for efficiency, hence
will not maintain order with respect to either input list (I doubt that
is an issue since in any case it is not possible to maintain it with
respect to both lists).

myTest[l1_,l2_] :=
Module[{s1,s2,m=Length[l1],n=Length[l2],j,k,res={},ord},
  s1 = Sort[l1];
  s2 = Sort[l2];
  For [j=1; k=1, j<=m && k<=n, Null,
    ord = Order[s1[[j,1]],s2[[k,1]]];
    If [ord==1, j++; Continue[]];
    If [ord==-1, k++; Continue[]];
    res = {res, {s1[[j]],s2[[k]]}};
    j++; k++;
    ];
  Partition[Partition[Flatten[res], 2], 2]
  ]

l1 = Table[{FromCharacterCode[Table[Random[Integer, {65, 90}], {4}]], 
  Random[Integer, {1, 2000}]}, {10^5}];
l2 = Table[{FromCharacterCode[Table[Random[Integer, {65, 90}], {4}]], 
  Random[Integer, {1, 2000}]}, {10^5}];

In[77]:= Timing[res = myTest2[l1,l2];]
Out[77]= {5.19 Second, Null}

This is on a 1.5 GHz machine. Note that this speed is by and large
independent of the size of the result, hence the fact that I increased
the alphabet range to A-Z is not an issue. One can probably devise ways
to make this faster still by avoiding strings and instead using integers
exclusively; in that case one can use Compile and in other ways take
advantage of better speed of processing machine integer matrices in
Mathematica.

Daniel Lichtblau
Wolfram Research


  • Prev by Date: Re: Re: Efficient Sorting Algorithm[3]
  • Next by Date: Newsgroup/Mailing Delays due to ice storm
  • Previous by thread: Efficient Sorting Algorithm
  • Next by thread: Re: Efficient Sorting Algorithm