Re: Re: DeleteDuplicates is too slow?
- To: mathgroup at smc.vnet.net
- Subject: [mg107233] Re: [mg107209] Re: DeleteDuplicates is too slow?
- From: DrMajorBob <btreat1 at austin.rr.com>
- Date: Sat, 6 Feb 2010 03:24:16 -0500 (EST)
- References: <201002050825.DAA06790@smc.vnet.net>
- Reply-to: drmajorbob at yahoo.com
Notice that sameQ@## & is the same as (and less verbose than) sameQ Bobby On Fri, 05 Feb 2010 02:25:03 -0600, Bill Rowe <readnews at sbcglobal.net> wrote: > On 2/4/10 at 6:26 AM, Clint.Zeringue at kirtland.af.mil (Zeringue, Clint > M Civ USAF AFMC AFRL/RDLAF) wrote: > >> Suppose you have the following. > >> Data = RandomReal[1,{N,2}]; > >> sameQ[_,_]=False; sameQ[{x_,y_},{x_,z_}]=True; > >> Timing[DeleteDuplicates[data,sameQ]][[1]]; > >> If N is a large number this takes an ungodly amount of time? > >> Is there a more efficient way to delete the duplicate entries of >> Data ? > > The slowness of DeleteDuplicates comes about when a custom > compare function is used as the following demonstrates > > In[27]:= sameQ[_, _] = False; > sameQ[x_, x_] = True; > > In[29]:= data = RandomInteger[100, 2000]; > > In[30]:= Timing[Length[a = DeleteDuplicates[data]]] > > Out[30]= {0.000025,101} > > In[31]:= Timing[Length[b = DeleteDuplicates[data, sameQ@## &]]] > > Out[31]= {0.215696,101} > > In[32]:= a == b > > Out[32]= True > > The above is simply to illustrate the issue, not to solve your > specific problem. For your case, I can get the same result in > much less time using GatherBy as illustrated below > > In[33]:= Clear[sameQ] > > In[34]:= sameQ[_, _] = False; > sameQ[{x_, y_}, {x_, z_}] = True; > > In[36]:= data = RandomInteger[100, {2000, 2}]; > > In[37]:= Timing[Length[b = DeleteDuplicates[data, sameQ@## &]]] > > Out[37]= {0.246448,101} > > In[38]:= Timing[Length[c = First /@ GatherBy[data, First]]] > > Out[38]= {0.000957,101} > > In[39]:= c == b > > Out[39]= True > > -- DrMajorBob at yahoo.com
- References:
- Re: DeleteDuplicates is too slow?
- From: Bill Rowe <readnews@sbcglobal.net>
- Re: DeleteDuplicates is too slow?