Re: DeleteDuplicates is too slow?
- To: mathgroup at smc.vnet.net
- Subject: [mg107188] Re: [mg107150] DeleteDuplicates is too slow?
- From: Daniel Lichtblau <danl at wolfram.com>
- Date: Fri, 5 Feb 2010 03:21:14 -0500 (EST)
- References: <201002041126.GAA29795@smc.vnet.net>
Zeringue, Clint M Civ USAF AFMC AFRL/RDLAF wrote: > Hello, > > Suppose you have the following. > > Data = RandomReal[1,{N,2}]; > > sameQ[_,_]=False; > sameQ[{x_,y_},{x_,z_}]=True; > > Timing[DeleteDuplicates[data,sameQ]][[1]]; > > If N is a large number this takes an ungodly amount of time? > > Is there a more efficient way to delete the duplicate entries of Data ? > > ie. > > Data = {{1.,2.},{1.,3.},{2.,3.}}; > > Would become: > {{1.,2.},{ 2.,3.}}; > > > Thanks, > > Clint Zeringue DeleteDuplicates is not able to recognize that there might be a fast way to use your sameQ, hence it is an O(n^2) proposition when list length is n. The variant below will behave better. deldupes[ll_] := Module[{h, res}, res = Reap[Do[ If[! TrueQ[h[ll[[j, 1]]]], h[ll[[j, 1]]] = True; Sow[ll[[j]]]]; , {j, Length[ll]}] ][[2, 1]]; Clear[h]; res] Example: data = RandomInteger[10^3, {10^4, 2}]; In[27]:= Timing[dd1 = deldupes[data];] Out[27]= {0.029996, Null} In[28]:= Timing[dd2 = DeleteDuplicates[data, sameQ];] Out[28]= {6.28405, Null} In[31]:= dd2 // Length Out[31]= 1001 In[32]:= dd1 === dd2 Out[32]= True Daniel Lichtblau Wolfram Research
- References:
- DeleteDuplicates is too slow?
- From: "Zeringue, Clint M Civ USAF AFMC AFRL/RDLAF" <Clint.Zeringue@kirtland.af.mil>
- DeleteDuplicates is too slow?