MathGroup Archive 2008

[Date Index] [Thread Index] [Author Index]

Search the Archive

About the performance of Union[] and Tally[]

  • To: mathgroup at smc.vnet.net
  • Subject: [mg84970] About the performance of Union[] and Tally[]
  • From: Szabolcs Horvát <szhorvat at gmail.com>
  • Date: Mon, 21 Jan 2008 06:56:16 -0500 (EST)

Dear All,

While replying to another post ("List complement operator"), I 
discovered something interesting about how Union[] and Tally[] perform 
on different lists, that you may find interesting/useful to know.

When applied to a single list, Union[] and Tally[] are in a sense 
similar functions:  it is very easy to obtain the union from the output 
of Tally[].  So let us compare their performance:


The number of different elements in this list is much less than the length:

In[1]:= a = RandomInteger[10000, {10000000}];

In[2]:= Timing[x = Union[a];]
Out[2]= {7.266, Null}

In[3]:= Timing[y = Sort[Tally[a][[All, 1]]];]
Out[3]= {0.093, Null}

In[4]:= x === y
Out[4]= True

So does this mean that it is always better to use Tally[] than Union[]? 
  No!  This is a list where the number of different elements is 
comparable with the length:

In[5]:= b = RandomInteger[10000000, {10000000}];

In[6]:= Timing[x = Union[b];]
Out[6]= {8.141, Null}

In[7]:= Timing[y = Sort[Tally[b][[All, 1]]];]
Out[7]= {19.328, Null}

In[8]:= x === y
Out[8]= True

Just to check that it is not the Sort[] that's taking so long:

In[9]:= Timing[Tally[b];]
Out[9]= {14.75, Null}

Any insights into why these functions behave like this is most welcome!

-- 
Szabolcs


  • Prev by Date: Re: Help with Eliminate[]
  • Next by Date: Optimization in Mathmatica
  • Previous by thread: Re: Polylog equations
  • Next by thread: Re: About the performance of Union[] and Tally[]