MathGroup Archive 2010

[Date Index] [Thread Index] [Author Index]

Search the Archive

Re: Re: Re: DeleteDuplicates is too slow?

  • To: mathgroup at smc.vnet.net
  • Subject: [mg107264] Re: [mg107233] Re: [mg107209] Re: DeleteDuplicates is too slow?
  • From: DrMajorBob <btreat1 at austin.rr.com>
  • Date: Sun, 7 Feb 2010 06:13:08 -0500 (EST)
  • References: <201002050825.DAA06790@smc.vnet.net>
  • Reply-to: drmajorbob at yahoo.com

Sorry... the second version is less verbose. (Obviously.)

Bobby

On Sat, 06 Feb 2010 02:24:16 -0600, DrMajorBob <btreat1 at austin.rr.com>  
wrote:

> Notice that
>
> sameQ@## &
>
> is the same as (and less verbose than)
>
> sameQ
>
> Bobby
>
> On Fri, 05 Feb 2010 02:25:03 -0600, Bill Rowe <readnews at sbcglobal.net>
> wrote:
>
>> On 2/4/10 at 6:26 AM, Clint.Zeringue at kirtland.af.mil (Zeringue, Clint
>> M Civ USAF AFMC AFRL/RDLAF) wrote:
>>
>>> Suppose you have the following.
>>
>>> Data = RandomReal[1,{N,2}];
>>
>>> sameQ[_,_]=False; sameQ[{x_,y_},{x_,z_}]=True;
>>
>>> Timing[DeleteDuplicates[data,sameQ]][[1]];
>>
>>> If N is a large number this takes an ungodly amount of time?
>>
>>> Is there a more efficient way to delete the duplicate entries of
>>> Data ?
>>
>> The slowness of DeleteDuplicates comes about when a custom
>> compare function is used as the following demonstrates
>>
>> In[27]:= sameQ[_, _] = False;
>> sameQ[x_, x_] = True;
>>
>> In[29]:= data = RandomInteger[100, 2000];
>>
>> In[30]:= Timing[Length[a = DeleteDuplicates[data]]]
>>
>> Out[30]= {0.000025,101}
>>
>> In[31]:= Timing[Length[b = DeleteDuplicates[data, sameQ@## &]]]
>>
>> Out[31]= {0.215696,101}
>>
>> In[32]:= a == b
>>
>> Out[32]= True
>>
>> The above is simply to illustrate the issue, not to solve your
>> specific problem. For your case, I can get the same result in
>> much less time using GatherBy as illustrated below
>>
>> In[33]:= Clear[sameQ]
>>
>> In[34]:= sameQ[_, _] = False;
>> sameQ[{x_, y_}, {x_, z_}] = True;
>>
>> In[36]:= data = RandomInteger[100, {2000, 2}];
>>
>> In[37]:= Timing[Length[b = DeleteDuplicates[data, sameQ@## &]]]
>>
>> Out[37]= {0.246448,101}
>>
>> In[38]:= Timing[Length[c = First /@ GatherBy[data, First]]]
>>
>> Out[38]= {0.000957,101}
>>
>> In[39]:= c == b
>>
>> Out[39]= True
>>
>>
>
>


-- 
DrMajorBob at yahoo.com


  • Prev by Date: Re: Re: Follow up to mg106646 - Selecting a range of
  • Next by Date: Re: A New Scientist article verified with Mathematica
  • Previous by thread: Re: Re: DeleteDuplicates is too slow?
  • Next by thread: Re: DeleteDuplicates is too slow?