 
 
 
 
 
 
Re: DeleteDuplicates is too slow?
- To: mathgroup at smc.vnet.net
- Subject: [mg107180] Re: DeleteDuplicates is too slow?
- From: Clint Zeringue <info at scienceskillz.com>
- Date: Fri, 5 Feb 2010 03:19:48 -0500 (EST)
Thanks for all the great responses. 
I have summarized them below.
Use Tally or, even better, GatherBy, to obtain very substantial reductions in time:
In[1]:= data=RandomInteger[{1,99},{100000,2}];
In[2]:= 
sameQ[_,_]=False;
sameQ[{x_,y_},{x_,z_}]=True;
In[4]:= Timing[t0=DeleteDuplicates[data,sameQ];]
Out[4]= {7.987,Null}
In[5]:= Timing[t1=#[[1]]&/@Tally[data,#1[[1]]==#2[[1]]&];][[1]]
Out[5]= 0.063
In[6]:= Timing[t2=#[[1]]&/@GatherBy[data,First];][[1]]
Out[6]= 0.016
In[7]:= t0===t1===t2
Out[7]= True
Tomas
Tomas Garza [tgarza10 at msn.com]
n = 100000;
data = RandomInteger[{0, 9}, {n, 2}] // N;
Length[DeleteDuplicates[data, #1[[1]] === #2[[1]] &]] // Timing
{1.39164,10}
Length[Union[data, SameTest -> (#1[[1]] === #2[[1]] &)]] // Timing
{0.289784,10}
Bob Hanlon
Bob Hanlon [hanlonr at cox.net]
Take care not to use N as a variable as it already has a built-in meaning.
If it is not necessary to keep the elements of the list in the same order, then a different, lower complexity algorithm can be used:
SplitBy[SortBy[data, First], First][[All, 1]]
This will be much faster, but will not remove exactly the same elements as DeleteDuplicates because the second element of the pairs is always ignored.  DeleteDuplicates will always keep the very first occurrence of equivalent elements.  Is this important for your calculation?
Szabolcs Horvát [szhorvat at gmail.com]
The fastest solution was Tomas Garza's :
Timing[t1=#[[1]]&/@Tally[data,#1[[1]]==#2[[1]]&];][[
- Follow-Ups:
- Re: Re: DeleteDuplicates is too slow?
- From: Leonid Shifrin <lshifr@gmail.com>
 
 
- Re: Re: DeleteDuplicates is too slow?

