Re: DeleteDuplicates is too slow?
- To: mathgroup at smc.vnet.net
- Subject: [mg107209] Re: DeleteDuplicates is too slow?
- From: Bill Rowe <readnews at sbcglobal.net>
- Date: Fri, 5 Feb 2010 03:25:03 -0500 (EST)
On 2/4/10 at 6:26 AM, Clint.Zeringue at kirtland.af.mil (Zeringue, Clint M Civ USAF AFMC AFRL/RDLAF) wrote: >Suppose you have the following. >Data = RandomReal[1,{N,2}]; >sameQ[_,_]=False; sameQ[{x_,y_},{x_,z_}]=True; >Timing[DeleteDuplicates[data,sameQ]][[1]]; >If N is a large number this takes an ungodly amount of time? >Is there a more efficient way to delete the duplicate entries of >Data ? The slowness of DeleteDuplicates comes about when a custom compare function is used as the following demonstrates In[27]:= sameQ[_, _] = False; sameQ[x_, x_] = True; In[29]:= data = RandomInteger[100, 2000]; In[30]:= Timing[Length[a = DeleteDuplicates[data]]] Out[30]= {0.000025,101} In[31]:= Timing[Length[b = DeleteDuplicates[data, sameQ@## &]]] Out[31]= {0.215696,101} In[32]:= a == b Out[32]= True The above is simply to illustrate the issue, not to solve your specific problem. For your case, I can get the same result in much less time using GatherBy as illustrated below In[33]:= Clear[sameQ] In[34]:= sameQ[_, _] = False; sameQ[{x_, y_}, {x_, z_}] = True; In[36]:= data = RandomInteger[100, {2000, 2}]; In[37]:= Timing[Length[b = DeleteDuplicates[data, sameQ@## &]]] Out[37]= {0.246448,101} In[38]:= Timing[Length[c = First /@ GatherBy[data, First]]] Out[38]= {0.000957,101} In[39]:= c == b Out[39]= True
- Follow-Ups:
- Re: Re: Re: DeleteDuplicates is too slow?
- From: DrMajorBob <btreat1@austin.rr.com>
- Re: Re: DeleteDuplicates is too slow?
- From: DrMajorBob <btreat1@austin.rr.com>
- Re: Re: Re: DeleteDuplicates is too slow?