Re: DeleteDuplicates is too slow?
- To: mathgroup at smc.vnet.net
- Subject: [mg107189] Re: [mg107150] DeleteDuplicates is too slow?
- From: Leonid Shifrin <lshifr at gmail.com>
- Date: Fri, 5 Feb 2010 03:21:25 -0500 (EST)
- References: <201002041126.GAA29795@smc.vnet.net>
Hi, you can use Tally[data][[All, 1]] although perhaps this is not the fastest. It seems like DeleteDuplicates is slow here because it uses the comparison function to compare each element with all the others, which takes quadratic time in the size of the dataset. It is blazingly fast on packed arrays of numbers, however. Regards, Leonid On Thu, Feb 4, 2010 at 2:26 PM, Zeringue, Clint M Civ USAF AFMC AFRL/RDLAF < Clint.Zeringue at kirtland.af.mil> wrote: > Hello, > > Suppose you have the following. > > Data = RandomReal[1,{N,2}]; > > sameQ[_,_]=False; > sameQ[{x_,y_},{x_,z_}]=True; > > Timing[DeleteDuplicates[data,sameQ]][[1]]; > > If N is a large number this takes an ungodly amount of time? > > Is there a more efficient way to delete the duplicate entries of Data ? > > ie. > > Data = {{1.,2.},{1.,3.},{2.,3.}}; > > Would become: > {{1.,2.},{ 2.,3.}}; > > > Thanks, > > > Clint Zeringue > >
- References:
- DeleteDuplicates is too slow?
- From: "Zeringue, Clint M Civ USAF AFMC AFRL/RDLAF" <Clint.Zeringue@kirtland.af.mil>
- DeleteDuplicates is too slow?