About the performance of Union[] and Tally[]
- To: mathgroup at smc.vnet.net
- Subject: [mg84970] About the performance of Union[] and Tally[]
- From: Szabolcs Horvát <szhorvat at gmail.com>
- Date: Mon, 21 Jan 2008 06:56:16 -0500 (EST)
Dear All,
While replying to another post ("List complement operator"), I
discovered something interesting about how Union[] and Tally[] perform
on different lists, that you may find interesting/useful to know.
When applied to a single list, Union[] and Tally[] are in a sense
similar functions: it is very easy to obtain the union from the output
of Tally[]. So let us compare their performance:
The number of different elements in this list is much less than the length:
In[1]:= a = RandomInteger[10000, {10000000}];
In[2]:= Timing[x = Union[a];]
Out[2]= {7.266, Null}
In[3]:= Timing[y = Sort[Tally[a][[All, 1]]];]
Out[3]= {0.093, Null}
In[4]:= x === y
Out[4]= True
So does this mean that it is always better to use Tally[] than Union[]?
No! This is a list where the number of different elements is
comparable with the length:
In[5]:= b = RandomInteger[10000000, {10000000}];
In[6]:= Timing[x = Union[b];]
Out[6]= {8.141, Null}
In[7]:= Timing[y = Sort[Tally[b][[All, 1]]];]
Out[7]= {19.328, Null}
In[8]:= x === y
Out[8]= True
Just to check that it is not the Sort[] that's taking so long:
In[9]:= Timing[Tally[b];]
Out[9]= {14.75, Null}
Any insights into why these functions behave like this is most welcome!
--
Szabolcs