Re: DeleteDuplicates is too slow?
- To: mathgroup at smc.vnet.net
- Subject: [mg107172] Re: DeleteDuplicates is too slow?
- From: Szabolcs Horvát <szhorvat at gmail.com>
- Date: Fri, 5 Feb 2010 03:18:20 -0500 (EST)
- References: <hkeaqc$t0f$1@smc.vnet.net>
On 2010.02.04. 12:25, Zeringue, Clint M Civ USAF AFMC AFRL/RDLAF wrote: > Hello, > > Suppose you have the following. > > Data = RandomReal[1,{N,2}]; > > sameQ[_,_]=False; > sameQ[{x_,y_},{x_,z_}]=True; > > Timing[DeleteDuplicates[data,sameQ]][[1]]; > > If N is a large number this takes an ungodly amount of time? > > Is there a more efficient way to delete the duplicate entries of Data ? > > ie. > > Data = {{1.,2.},{1.,3.},{2.,3.}}; > > Would become: > {{1.,2.},{ 2.,3.}}; > Take care not to use N as a variable as it already has a built-in meaning. If it is not necessary to keep the elements of the list in the same order, then a different, lower complexity algorithm can be used: SplitBy[SortBy[data, First], First][[All, 1]] This will be much faster, but will not remove exactly the same elements as DeleteDuplicates because the second element of the pairs is always ignored. DeleteDuplicates will always keep the very first occurrence of equivalent elements. Is this important for your calculation?