       Re: DeleteDuplicates is too slow?

• To: mathgroup at smc.vnet.net
• Subject: [mg107209] Re: DeleteDuplicates is too slow?
• From: Bill Rowe <readnews at sbcglobal.net>
• Date: Fri, 5 Feb 2010 03:25:03 -0500 (EST)

```On 2/4/10 at 6:26 AM, Clint.Zeringue at kirtland.af.mil (Zeringue, Clint
M Civ USAF AFMC AFRL/RDLAF) wrote:

>Suppose you have the following.

>Data = RandomReal[1,{N,2}];

>sameQ[_,_]=False; sameQ[{x_,y_},{x_,z_}]=True;

>Timing[DeleteDuplicates[data,sameQ]][];

>If N is a large number this takes an ungodly amount of time?

>Is there a more efficient way to delete the duplicate entries of
>Data ?

The slowness of DeleteDuplicates comes about when a custom
compare function is used as the following demonstrates

In:= sameQ[_, _] = False;
sameQ[x_, x_] = True;

In:= data = RandomInteger[100, 2000];

In:= Timing[Length[a = DeleteDuplicates[data]]]

Out= {0.000025,101}

In:= Timing[Length[b = DeleteDuplicates[data, sameQ@## &]]]

Out= {0.215696,101}

In:= a == b

Out= True

The above is simply to illustrate the issue, not to solve your
specific problem. For your case, I can get the same result in
much less time using GatherBy as illustrated below

In:= Clear[sameQ]

In:= sameQ[_, _] = False;
sameQ[{x_, y_}, {x_, z_}] = True;

In:= data = RandomInteger[100, {2000, 2}];

In:= Timing[Length[b = DeleteDuplicates[data, sameQ@## &]]]

Out= {0.246448,101}

In:= Timing[Length[c = First /@ GatherBy[data, First]]]

Out= {0.000957,101}

In:= c == b

Out= True

```

• Prev by Date: Re: Combining InterpolatingFunctions
• Next by Date: Re: Bug? Analytical integration of cosines gets the sign wrong
• Previous by thread: Re: DeleteDuplicates is too slow?
• Next by thread: Re: Re: DeleteDuplicates is too slow?