Re: DeleteDuplicates is too slow?

• To: mathgroup at smc.vnet.net
• Subject: [mg107209] Re: DeleteDuplicates is too slow?
• From: Bill Rowe <readnews at sbcglobal.net>
• Date: Fri, 5 Feb 2010 03:25:03 -0500 (EST)

```On 2/4/10 at 6:26 AM, Clint.Zeringue at kirtland.af.mil (Zeringue, Clint
M Civ USAF AFMC AFRL/RDLAF) wrote:

>Suppose you have the following.

>Data = RandomReal[1,{N,2}];

>sameQ[_,_]=False; sameQ[{x_,y_},{x_,z_}]=True;

>Timing[DeleteDuplicates[data,sameQ]][[1]];

>If N is a large number this takes an ungodly amount of time?

>Is there a more efficient way to delete the duplicate entries of
>Data ?

The slowness of DeleteDuplicates comes about when a custom
compare function is used as the following demonstrates

In[27]:= sameQ[_, _] = False;
sameQ[x_, x_] = True;

In[29]:= data = RandomInteger[100, 2000];

In[30]:= Timing[Length[a = DeleteDuplicates[data]]]

Out[30]= {0.000025,101}

In[31]:= Timing[Length[b = DeleteDuplicates[data, sameQ@## &]]]

Out[31]= {0.215696,101}

In[32]:= a == b

Out[32]= True

The above is simply to illustrate the issue, not to solve your
specific problem. For your case, I can get the same result in
much less time using GatherBy as illustrated below

In[33]:= Clear[sameQ]

In[34]:= sameQ[_, _] = False;
sameQ[{x_, y_}, {x_, z_}] = True;

In[36]:= data = RandomInteger[100, {2000, 2}];

In[37]:= Timing[Length[b = DeleteDuplicates[data, sameQ@## &]]]

Out[37]= {0.246448,101}

In[38]:= Timing[Length[c = First /@ GatherBy[data, First]]]

Out[38]= {0.000957,101}

In[39]:= c == b

Out[39]= True

```

• Prev by Date: Re: Combining InterpolatingFunctions
• Next by Date: Re: Bug? Analytical integration of cosines gets the sign wrong
• Previous by thread: Re: DeleteDuplicates is too slow?
• Next by thread: Re: Re: DeleteDuplicates is too slow?