MathGroup Archive 2010

[Date Index] [Thread Index] [Author Index]

Search the Archive

Re: Re: DeleteDuplicates is too slow?

  • To: mathgroup at smc.vnet.net
  • Subject: [mg107233] Re: [mg107209] Re: DeleteDuplicates is too slow?
  • From: DrMajorBob <btreat1 at austin.rr.com>
  • Date: Sat, 6 Feb 2010 03:24:16 -0500 (EST)
  • References: <201002050825.DAA06790@smc.vnet.net>
  • Reply-to: drmajorbob at yahoo.com

Notice that

sameQ@## &

is the same as (and less verbose than)

sameQ

Bobby

On Fri, 05 Feb 2010 02:25:03 -0600, Bill Rowe <readnews at sbcglobal.net>  
wrote:

> On 2/4/10 at 6:26 AM, Clint.Zeringue at kirtland.af.mil (Zeringue, Clint
> M Civ USAF AFMC AFRL/RDLAF) wrote:
>
>> Suppose you have the following.
>
>> Data = RandomReal[1,{N,2}];
>
>> sameQ[_,_]=False; sameQ[{x_,y_},{x_,z_}]=True;
>
>> Timing[DeleteDuplicates[data,sameQ]][[1]];
>
>> If N is a large number this takes an ungodly amount of time?
>
>> Is there a more efficient way to delete the duplicate entries of
>> Data ?
>
> The slowness of DeleteDuplicates comes about when a custom
> compare function is used as the following demonstrates
>
> In[27]:= sameQ[_, _] = False;
> sameQ[x_, x_] = True;
>
> In[29]:= data = RandomInteger[100, 2000];
>
> In[30]:= Timing[Length[a = DeleteDuplicates[data]]]
>
> Out[30]= {0.000025,101}
>
> In[31]:= Timing[Length[b = DeleteDuplicates[data, sameQ@## &]]]
>
> Out[31]= {0.215696,101}
>
> In[32]:= a == b
>
> Out[32]= True
>
> The above is simply to illustrate the issue, not to solve your
> specific problem. For your case, I can get the same result in
> much less time using GatherBy as illustrated below
>
> In[33]:= Clear[sameQ]
>
> In[34]:= sameQ[_, _] = False;
> sameQ[{x_, y_}, {x_, z_}] = True;
>
> In[36]:= data = RandomInteger[100, {2000, 2}];
>
> In[37]:= Timing[Length[b = DeleteDuplicates[data, sameQ@## &]]]
>
> Out[37]= {0.246448,101}
>
> In[38]:= Timing[Length[c = First /@ GatherBy[data, First]]]
>
> Out[38]= {0.000957,101}
>
> In[39]:= c == b
>
> Out[39]= True
>
>


-- 
DrMajorBob at yahoo.com


  • Prev by Date: Re: Follow up to mg106646 - Selecting a range of dates?
  • Next by Date: Re: Obtain smooth plot of free-hand contour
  • Previous by thread: Re: DeleteDuplicates is too slow?
  • Next by thread: Re: Re: Re: DeleteDuplicates is too slow?