About the performance of Union[] and Tally[]
- To: mathgroup at smc.vnet.net
- Subject: [mg84970] About the performance of Union[] and Tally[]
- From: Szabolcs Horvát <szhorvat at gmail.com>
- Date: Mon, 21 Jan 2008 06:56:16 -0500 (EST)
Dear All, While replying to another post ("List complement operator"), I discovered something interesting about how Union[] and Tally[] perform on different lists, that you may find interesting/useful to know. When applied to a single list, Union[] and Tally[] are in a sense similar functions: it is very easy to obtain the union from the output of Tally[]. So let us compare their performance: The number of different elements in this list is much less than the length: In[1]:= a = RandomInteger[10000, {10000000}]; In[2]:= Timing[x = Union[a];] Out[2]= {7.266, Null} In[3]:= Timing[y = Sort[Tally[a][[All, 1]]];] Out[3]= {0.093, Null} In[4]:= x === y Out[4]= True So does this mean that it is always better to use Tally[] than Union[]? No! This is a list where the number of different elements is comparable with the length: In[5]:= b = RandomInteger[10000000, {10000000}]; In[6]:= Timing[x = Union[b];] Out[6]= {8.141, Null} In[7]:= Timing[y = Sort[Tally[b][[All, 1]]];] Out[7]= {19.328, Null} In[8]:= x === y Out[8]= True Just to check that it is not the Sort[] that's taking so long: In[9]:= Timing[Tally[b];] Out[9]= {14.75, Null} Any insights into why these functions behave like this is most welcome! -- Szabolcs