Re: DeleteDuplicates is too slow?

• To: mathgroup at smc.vnet.net
• Subject: [mg107180] Re: DeleteDuplicates is too slow?
• From: Clint Zeringue <info at scienceskillz.com>
• Date: Fri, 5 Feb 2010 03:19:48 -0500 (EST)

```Thanks for all the great responses.

I have summarized them below.

Use Tally or, even better, GatherBy, to obtain very substantial reductions in time:

In[1]:= data=RandomInteger[{1,99},{100000,2}];

In[2]:=
sameQ[_,_]=False;
sameQ[{x_,y_},{x_,z_}]=True;

In[4]:= Timing[t0=DeleteDuplicates[data,sameQ];]
Out[4]= {7.987,Null}

In[5]:= Timing[t1=#[[1]]&/@Tally[data,#1[[1]]==#2[[1]]&];][[1]]
Out[5]= 0.063

In[6]:= Timing[t2=#[[1]]&/@GatherBy[data,First];][[1]]
Out[6]= 0.016

In[7]:= t0===t1===t2
Out[7]= True

Tomas
Tomas Garza [tgarza10 at msn.com]

n = 100000;

data = RandomInteger[{0, 9}, {n, 2}] // N;

Length[DeleteDuplicates[data, #1[[1]] === #2[[1]] &]] // Timing

{1.39164,10}

Length[Union[data, SameTest -> (#1[[1]] === #2[[1]] &)]] // Timing

{0.289784,10}

Bob Hanlon
Bob Hanlon [hanlonr at cox.net]

Take care not to use N as a variable as it already has a built-in meaning.

If it is not necessary to keep the elements of the list in the same order, then a different, lower complexity algorithm can be used:

SplitBy[SortBy[data, First], First][[All, 1]]

This will be much faster, but will not remove exactly the same elements as DeleteDuplicates because the second element of the pairs is always ignored.  DeleteDuplicates will always keep the very first occurrence of equivalent elements.  Is this important for your calculation?

Szabolcs HorvÃ¡t [szhorvat at gmail.com]

The fastest solution was Tomas Garza's :

Timing[t1=#[[1]]&/@Tally[data,#1[[1]]==#2[[1]]&];][[

```

• Prev by Date: Re: Diagonalizing large matrices
• Next by Date: Re: Bug? Analytical integration of cosines gets the sign wrong
• Previous by thread: Re: Re: Re: DeleteDuplicates is too slow?
• Next by thread: Re: Re: DeleteDuplicates is too slow?