Re: About the performance of Union[] and Tally[]
- To: mathgroup at smc.vnet.net
- Subject: [mg85011] Re: About the performance of Union[] and Tally[]
- From: Szabolcs Horvát <szhorvat at gmail.com>
- Date: Tue, 22 Jan 2008 05:40:35 -0500 (EST)
- References: <4794734F.50906@gmail.com>
Szabolcs Horvát wrote: > > Dear All, > > While replying to another post ("List complement operator"), I > discovered something interesting about how Union[] and Tally[] perform > on different lists, that you may find interesting/useful to know. > > When applied to a single list, Union[] and Tally[] are in a sense > similar functions: it is very easy to obtain the union from the output > of Tally[]. So let us compare their performance: > > > The number of different elements in this list is much less than the length: > > In[1]:= a = RandomInteger[10000, {10000000}]; > > In[2]:= Timing[x = Union[a];] > Out[2]= {7.266, Null} > > In[3]:= Timing[y = Sort[Tally[a][[All, 1]]];] > Out[3]= {0.093, Null} > > In[4]:= x === y > Out[4]= True > > So does this mean that it is always better to use Tally[] than Union[]? > No! This is a list where the number of different elements is > comparable with the length: > > In[5]:= b = RandomInteger[10000000, {10000000}]; > > In[6]:= Timing[x = Union[b];] > Out[6]= {8.141, Null} > > In[7]:= Timing[y = Sort[Tally[b][[All, 1]]];] > Out[7]= {19.328, Null} > > In[8]:= x === y > Out[8]= True > > Just to check that it is not the Sort[] that's taking so long: > > In[9]:= Timing[Tally[b];] > Out[9]= {14.75, Null} > > Any insights into why these functions behave like this is most welcome! > Well, actually this is not at all surprising: Tally[] needs to iterate over the list only once, but for every element, it has to decide whether it has seen it before or not. If there are many different elements, this is slow. If a long list is made up of only a few types of elements, it can be much faster than Union[] (whose speed is about the same as that of Sort[]). Anyway, it is good to know that in certain situations (few different types of elements, no need for sorting), there is a faster alternative to Union[]. -- Szabolcs