MathGroup Archive 2008

[Date Index] [Thread Index] [Author Index]

Search the Archive

Re: About the performance of Union[] and Tally[]

  • To: mathgroup at smc.vnet.net
  • Subject: [mg85011] Re: About the performance of Union[] and Tally[]
  • From: Szabolcs Horvát <szhorvat at gmail.com>
  • Date: Tue, 22 Jan 2008 05:40:35 -0500 (EST)
  • References: <4794734F.50906@gmail.com>

Szabolcs Horvát wrote:
> 
> Dear All,
> 
> While replying to another post ("List complement operator"), I 
> discovered something interesting about how Union[] and Tally[] perform 
> on different lists, that you may find interesting/useful to know.
> 
> When applied to a single list, Union[] and Tally[] are in a sense 
> similar functions:  it is very easy to obtain the union from the output 
> of Tally[].  So let us compare their performance:
> 
> 
> The number of different elements in this list is much less than the length:
> 
> In[1]:= a = RandomInteger[10000, {10000000}];
> 
> In[2]:= Timing[x = Union[a];]
> Out[2]= {7.266, Null}
> 
> In[3]:= Timing[y = Sort[Tally[a][[All, 1]]];]
> Out[3]= {0.093, Null}
> 
> In[4]:= x === y
> Out[4]= True
> 
> So does this mean that it is always better to use Tally[] than Union[]? 
>  No!  This is a list where the number of different elements is 
> comparable with the length:
> 
> In[5]:= b = RandomInteger[10000000, {10000000}];
> 
> In[6]:= Timing[x = Union[b];]
> Out[6]= {8.141, Null}
> 
> In[7]:= Timing[y = Sort[Tally[b][[All, 1]]];]
> Out[7]= {19.328, Null}
> 
> In[8]:= x === y
> Out[8]= True
> 
> Just to check that it is not the Sort[] that's taking so long:
> 
> In[9]:= Timing[Tally[b];]
> Out[9]= {14.75, Null}
> 
> Any insights into why these functions behave like this is most welcome!
> 

Well, actually this is not at all surprising:  Tally[] needs to iterate 
over the list only once, but for every element, it has to decide whether 
it has seen it before or not.  If there are many different elements, 
this is slow.  If a long list is made up of only a few types of 
elements, it can be much faster than Union[] (whose speed is about the 
same as that of Sort[]).

Anyway, it is good to know that in certain situations (few different 
types of elements, no need for sorting), there is a faster alternative 
to Union[].

-- 
Szabolcs


  • Prev by Date: Re: Question: how does mathematica determine the coefficient of the
  • Next by Date: Re: Why can't Mathematica simplify this?
  • Previous by thread: About the performance of Union[] and Tally[]
  • Next by thread: Optimization in Mathmatica