Re: DeleteDuplicates is too slow?
- To: mathgroup at smc.vnet.net
- Subject: [mg107209] Re: DeleteDuplicates is too slow?
- From: Bill Rowe <readnews at sbcglobal.net>
- Date: Fri, 5 Feb 2010 03:25:03 -0500 (EST)
On 2/4/10 at 6:26 AM, Clint.Zeringue at kirtland.af.mil (Zeringue, Clint
M Civ USAF AFMC AFRL/RDLAF) wrote:
>Suppose you have the following.
>Data = RandomReal[1,{N,2}];
>sameQ[_,_]=False; sameQ[{x_,y_},{x_,z_}]=True;
>Timing[DeleteDuplicates[data,sameQ]][[1]];
>If N is a large number this takes an ungodly amount of time?
>Is there a more efficient way to delete the duplicate entries of
>Data ?
The slowness of DeleteDuplicates comes about when a custom
compare function is used as the following demonstrates
In[27]:= sameQ[_, _] = False;
sameQ[x_, x_] = True;
In[29]:= data = RandomInteger[100, 2000];
In[30]:= Timing[Length[a = DeleteDuplicates[data]]]
Out[30]= {0.000025,101}
In[31]:= Timing[Length[b = DeleteDuplicates[data, sameQ@## &]]]
Out[31]= {0.215696,101}
In[32]:= a == b
Out[32]= True
The above is simply to illustrate the issue, not to solve your
specific problem. For your case, I can get the same result in
much less time using GatherBy as illustrated below
In[33]:= Clear[sameQ]
In[34]:= sameQ[_, _] = False;
sameQ[{x_, y_}, {x_, z_}] = True;
In[36]:= data = RandomInteger[100, {2000, 2}];
In[37]:= Timing[Length[b = DeleteDuplicates[data, sameQ@## &]]]
Out[37]= {0.246448,101}
In[38]:= Timing[Length[c = First /@ GatherBy[data, First]]]
Out[38]= {0.000957,101}
In[39]:= c == b
Out[39]= True
- Follow-Ups:
- Re: Re: Re: DeleteDuplicates is too slow?
- From: DrMajorBob <btreat1@austin.rr.com>
- Re: Re: DeleteDuplicates is too slow?
- From: DrMajorBob <btreat1@austin.rr.com>
- Re: Re: Re: DeleteDuplicates is too slow?