MathGroup Archive 2010

[Date Index] [Thread Index] [Author Index]

Search the Archive

Re: DeleteDuplicates is too slow?

  • To: mathgroup at smc.vnet.net
  • Subject: [mg107212] Re: [mg107150] DeleteDuplicates is too slow?
  • From: Tomas Garza <tgarza10 at msn.com>
  • Date: Fri, 5 Feb 2010 03:25:36 -0500 (EST)
  • References: <201002041126.GAA29795@smc.vnet.net>

Use Tally or, even better, GatherBy, to obtain very substantial reduc=
tions in time:

In[1]:= data=RandomInteger[{1,99},{100000,2}];

In[2]:=
sameQ[_,_]=False;
sameQ[{x_,y_},{x_,z_}]=True;

In[4]:= Timing[t0=DeleteDuplicates[data,sameQ];]
Out[4]= {7.987,Null}

In[5]:= Timing[t1=#[[1]]&/@Tally[data,#1[[1]]==#2[[1]]&];][[1]]
Out[5]= 0.063

In[6]:= Timing[t2=#[[1]]&/@GatherBy[data,First];][[1]]
Out[6]= 0.016

In[7]:= t0===t1===t2
Out[7]= True

Tomas


> Date: Thu, 4 Feb 2010 06:26:02 -0500
> From: Clint.Zeringue at kirtland.af.mil
> Subject: [mg107150] DeleteDuplicates  is too slow?
> To: mathgroup at smc.vnet.net
>
> Hello,
>
> Suppose you have the following.
>
> Data = RandomReal[1,{N,2}];
>
> sameQ[_,_]=False;
> sameQ[{x_,y_},{x_,z_}]=True;
>
> Timing[DeleteDuplicates[data,sameQ]][[1]];
>
> If N is a large number this takes an ungodly amount of time?
>
> Is there a more efficient way to delete the duplicate entries of Data ?
>
> ie.
>
> Data = {{1.,2.},{1.,3.},{2.,3.}};
>
> Would become:
> {{1.,2.},{ 2.,3.}};
>
>
> Thanks,
>
>
> Clint Zeringue
>


  • Prev by Date: Re: DeleteDuplicates is too slow?
  • Next by Date: Sort of nested NIntegrate
  • Previous by thread: Re: DeleteDuplicates is too slow?
  • Next by thread: Re: DeleteDuplicates is too slow?