Re: Union slowdown when SameTest is specified
- To: mathgroup at smc.vnet.net
- Subject: [mg101395] Re: [mg101378] Union slowdown when SameTest is specified
- From: danl at wolfram.com
- Date: Sat, 4 Jul 2009 06:45:03 -0400 (EDT)
- References: <200907030940.FAA19348@smc.vnet.net>
> Hello group! I was a little surprised and puzzled by the following: > > In[1]:= $Version > Out[1]= "7.0 for Mac OS X PowerPC (32-bit) (November 10, 2008)" > > In[2]:= data = RandomInteger[{1, 10}, {5000, 2, 12}]; > Timing[Union[data];] > Out[3]= {0.008138, Null} > > So far so good; so far so speedy. But why does this happen: > > In[4]:= Timing[Union[data, SameTest -> Equal];] > Out[4]= {15.313, Null} > > Or more egregiously, > > In[5]:= Timing[Union[data, SameTest -> (First@#1 == First@#2 &)];] > Out[5]= {86.293, Null} > > There's nothing special about the form of the data here, just it > happens to be similar in form to the data I was working on when I > experienced terrible slowdown with the SameTest that I've shown in the > third example - which doesn't seem like a demanding comparison. So > can anyone explain the orders-of-magnitude change in times here? Much > appreciated! Default behavior uses sorting and then comparisons of neighbors only. For a set of n elements you'd get O(n log(n)) behavior. (There is an oft-cited code for a Union-no-sort that is O(n) though with larger hidden constant, but that's another story). When given a SameTest, Union will not sort but rather test a given element against every other (remaining) element. This is in general O(n^2). Another drawback is that it must make a trip through the Mathematica evaluator to evaluate the provided SameTest. This can be considerably slower than internal code for the default test, and moreover the speed can scale with the complexity of the test (witness the distinction between your second and third examples). These are the issues behind the timing differences you are seeing. Daniel Lichtblau Wolfram Research
- References:
- Union slowdown when SameTest is specified
- From: J Siehler <jsiehler@gmail.com>
- Union slowdown when SameTest is specified