MathGroup Archive 2010

[Date Index] [Thread Index] [Author Index]

Search the Archive

Re: DeleteDuplicates is too slow?

  • To: mathgroup at smc.vnet.net
  • Subject: [mg107188] Re: [mg107150] DeleteDuplicates is too slow?
  • From: Daniel Lichtblau <danl at wolfram.com>
  • Date: Fri, 5 Feb 2010 03:21:14 -0500 (EST)
  • References: <201002041126.GAA29795@smc.vnet.net>

Zeringue, Clint M Civ USAF AFMC AFRL/RDLAF wrote:
> Hello,
> 
> Suppose you have the following.
> 
> Data = RandomReal[1,{N,2}];
> 
> sameQ[_,_]=False;
> sameQ[{x_,y_},{x_,z_}]=True;
> 
> Timing[DeleteDuplicates[data,sameQ]][[1]];
> 
> If N is a large number this takes an ungodly amount of time?
> 
> Is there a more efficient way to delete the duplicate entries of Data ?
> 
> ie.
> 
> Data = {{1.,2.},{1.,3.},{2.,3.}};
> 
> Would become:
> {{1.,2.},{ 2.,3.}};
> 
> 
> Thanks,
> 
> Clint Zeringue

DeleteDuplicates is not able to recognize that there might be a fast way 
to use your sameQ, hence it is an O(n^2) proposition when list length is 
n. The variant below will behave better.

deldupes[ll_] := Module[{h, res},
   res = Reap[Do[
       If[! TrueQ[h[ll[[j, 1]]]], h[ll[[j, 1]]] = True; Sow[ll[[j]]]];
       , {j, Length[ll]}]
      ][[2, 1]];
   Clear[h];
   res]

Example:

data = RandomInteger[10^3, {10^4, 2}];

In[27]:= Timing[dd1 = deldupes[data];]
Out[27]= {0.029996, Null}

In[28]:= Timing[dd2 = DeleteDuplicates[data, sameQ];]
Out[28]= {6.28405, Null}

In[31]:= dd2 // Length
Out[31]= 1001

In[32]:= dd1 === dd2
Out[32]= True

Daniel Lichtblau
Wolfram Research


  • Prev by Date: Re: Bug? Analytical integration of cosines gets the sign wrong
  • Next by Date: Re: Bug? Analytical integration of cosines gets the sign wrong
  • Previous by thread: Re: DeleteDuplicates is too slow?
  • Next by thread: Re: DeleteDuplicates is too slow?