Re: Union slowdown when SameTest is specified

• To: mathgroup at smc.vnet.net
• Subject: [mg101395] Re: [mg101378] Union slowdown when SameTest is specified
• From: danl at wolfram.com
• Date: Sat, 4 Jul 2009 06:45:03 -0400 (EDT)
• References: <200907030940.FAA19348@smc.vnet.net>

```> Hello group!  I was a little surprised and puzzled by the following:
>
> In[1]:= \$Version
> Out[1]= "7.0 for Mac OS X PowerPC (32-bit) (November 10, 2008)"
>
> In[2]:= data = RandomInteger[{1, 10}, {5000, 2, 12}];
> Timing[Union[data];]
> Out[3]= {0.008138, Null}
>
> So far so good; so far so speedy.  But why does this happen:
>
> In[4]:= Timing[Union[data, SameTest -> Equal];]
> Out[4]= {15.313, Null}
>
> Or more egregiously,
>
> In[5]:= Timing[Union[data, SameTest -> (First@#1 == First@#2 &)];]
> Out[5]= {86.293, Null}
>
> There's nothing special about the form of the data here, just it
> happens to be similar in form to the data I was working on when I
> experienced terrible slowdown with the SameTest that I've shown in the
> third example - which doesn't seem like a demanding comparison.    So
> can anyone explain the orders-of-magnitude change in times here?  Much
> appreciated!

Default behavior uses sorting and then comparisons of neighbors only. For
a set of n elements you'd get O(n log(n)) behavior. (There is an oft-cited
code for a Union-no-sort that is O(n) though with larger hidden constant,
but that's another story).

When given a SameTest, Union will not sort but rather test a given element
against every other (remaining) element. This is in general O(n^2).
Another drawback is that it must make a trip through the Mathematica
evaluator to evaluate the provided SameTest. This can be considerably
slower than internal code for the default test, and moreover the speed can
scale with the complexity of the test (witness the distinction between
your second and third examples). These are the issues behind the timing
differences you are seeing.

Daniel Lichtblau
Wolfram Research

```

• Prev by Date: Re: Re: Is Orange translucent? - What Methods exist?
• Next by Date: Re: Union slowdown when SameTest is specified
• Previous by thread: Re: Union slowdown when SameTest is specified
• Next by thread: Re: Union slowdown when SameTest is specified