Re: DeleteDuplicates is too slow?

• To: mathgroup at smc.vnet.net
• Subject: [mg107172] Re: DeleteDuplicates is too slow?
• From: Szabolcs Horvát <szhorvat at gmail.com>
• Date: Fri, 5 Feb 2010 03:18:20 -0500 (EST)
• References: <hkeaqc\$t0f\$1@smc.vnet.net>

```On 2010.02.04. 12:25, Zeringue, Clint M Civ USAF AFMC AFRL/RDLAF wrote:
> Hello,
>
> Suppose you have the following.
>
> Data = RandomReal[1,{N,2}];
>
> sameQ[_,_]=False;
> sameQ[{x_,y_},{x_,z_}]=True;
>
> Timing[DeleteDuplicates[data,sameQ]][[1]];
>
> If N is a large number this takes an ungodly amount of time?
>
> Is there a more efficient way to delete the duplicate entries of Data ?
>
> ie.
>
> Data = {{1.,2.},{1.,3.},{2.,3.}};
>
> Would become:
> {{1.,2.},{ 2.,3.}};
>

Take care not to use N as a variable as it already has a built-in meaning.

If it is not necessary to keep the elements of the list in the same
order, then a different, lower complexity algorithm can be used:

SplitBy[SortBy[data, First], First][[All, 1]]

This will be much faster, but will not remove exactly the same elements
as DeleteDuplicates because the second element of the pairs is always
ignored.  DeleteDuplicates will always keep the very first occurrence of
equivalent elements.  Is this important for your calculation?

```

• Prev by Date: Re: Bug? Analytical integration of cosines gets the sign wrong
• Next by Date: Re: DeleteDuplicates is too slow?
• Previous by thread: Re: DeleteDuplicates is too slow?
• Next by thread: Re: DeleteDuplicates is too slow?