Re: Re: Re: DeleteDuplicates is too slow?
- To: mathgroup at smc.vnet.net
- Subject: [mg107264] Re: [mg107233] Re: [mg107209] Re: DeleteDuplicates is too slow?
- From: DrMajorBob <btreat1 at austin.rr.com>
- Date: Sun, 7 Feb 2010 06:13:08 -0500 (EST)
- References: <201002050825.DAA06790@smc.vnet.net>
- Reply-to: drmajorbob at yahoo.com
Sorry... the second version is less verbose. (Obviously.) Bobby On Sat, 06 Feb 2010 02:24:16 -0600, DrMajorBob <btreat1 at austin.rr.com> wrote: > Notice that > > sameQ@## & > > is the same as (and less verbose than) > > sameQ > > Bobby > > On Fri, 05 Feb 2010 02:25:03 -0600, Bill Rowe <readnews at sbcglobal.net> > wrote: > >> On 2/4/10 at 6:26 AM, Clint.Zeringue at kirtland.af.mil (Zeringue, Clint >> M Civ USAF AFMC AFRL/RDLAF) wrote: >> >>> Suppose you have the following. >> >>> Data = RandomReal[1,{N,2}]; >> >>> sameQ[_,_]=False; sameQ[{x_,y_},{x_,z_}]=True; >> >>> Timing[DeleteDuplicates[data,sameQ]][[1]]; >> >>> If N is a large number this takes an ungodly amount of time? >> >>> Is there a more efficient way to delete the duplicate entries of >>> Data ? >> >> The slowness of DeleteDuplicates comes about when a custom >> compare function is used as the following demonstrates >> >> In[27]:= sameQ[_, _] = False; >> sameQ[x_, x_] = True; >> >> In[29]:= data = RandomInteger[100, 2000]; >> >> In[30]:= Timing[Length[a = DeleteDuplicates[data]]] >> >> Out[30]= {0.000025,101} >> >> In[31]:= Timing[Length[b = DeleteDuplicates[data, sameQ@## &]]] >> >> Out[31]= {0.215696,101} >> >> In[32]:= a == b >> >> Out[32]= True >> >> The above is simply to illustrate the issue, not to solve your >> specific problem. For your case, I can get the same result in >> much less time using GatherBy as illustrated below >> >> In[33]:= Clear[sameQ] >> >> In[34]:= sameQ[_, _] = False; >> sameQ[{x_, y_}, {x_, z_}] = True; >> >> In[36]:= data = RandomInteger[100, {2000, 2}]; >> >> In[37]:= Timing[Length[b = DeleteDuplicates[data, sameQ@## &]]] >> >> Out[37]= {0.246448,101} >> >> In[38]:= Timing[Length[c = First /@ GatherBy[data, First]]] >> >> Out[38]= {0.000957,101} >> >> In[39]:= c == b >> >> Out[39]= True >> >> > > -- DrMajorBob at yahoo.com
- References:
- Re: DeleteDuplicates is too slow?
- From: Bill Rowe <readnews@sbcglobal.net>
- Re: DeleteDuplicates is too slow?