Re: Re: RE: Re: Re: easiest way to sort a list?

*To*: mathgroup at smc.vnet.net*Subject*: [mg18589] Re: [mg18556] Re: [mg18525] RE: [mg18428] Re: [mg18344] Re: [mg18308] easiest way to sort a list?*From*: "Carl K.Woll" <carlw at fermi.phys.washington.edu>*Date*: Tue, 13 Jul 1999 01:01:24 -0400*Organization*: Department of Physics*References*: <199907100618.CAA02996@smc.vnet.net.>*Sender*: owner-wri-mathgroup at wolfram.com

Andrzej, Your results are very different from those I get with Mathematica V3.0 on a 100 MHz PC with Windows 95. For your l2 list, I got {{1.93 Second, Null}, {6.04 Second, Null}, {43.12 Second, Null}, {5.65 Second, Null}} If I try using the following list Table[Random[Integer, {1, 5000}], {10000}]; I get the following timings {{7.47 Second, Null}, {14.4 Second, Null}, {177.24 Second, Null}, {6.98 Second, Null}} So, for some reason OrderedUnion[4] is much slower on your machine than it is on mine. That is, for the list l2, OrderedUnion[4] is 66 times slower than OrderedUnion[1] on your machine, but it is only 3 times slower on my machine. This is odd, since OrderedUnion[4] uses Sort and Union, and I thought Colin Rose suggested that Sort and Union are much faster in Version 4.0 than in Version 3.0. Hence, I would have thought that OrderedUnion[4] would have worked even better on your machine. Carl Woll Dept of Physics U of Washington Andrzej Kozlowski wrote: > Here are some more results on my PowerBook G3 (233 mghz, Mathematica 4.0): > > In[1]:= > OrderedUnion[1][li_] := > Block[{i, Sequence}, > i[n_] := (i[n] = Sequence[]; n); > i /@ li] > > OrderedUnion[2][li_] := > Block[{i, counter = 0, l0 = {}, m = Length[Union[li]], Sequence}, > i[n_] := (i[n] = Sequence[]; counter = counter + 1; n); > Scan[If[counter < m, l0 = {l0, i[#]}, Return[Flatten[l0]]] &, li]; > Flatten[l0]] > > OrderedUnion[3][h_[args__]] := > Fold[ If[ FreeQ[#1, #2, 1], Join[#1, h[#2]], #1] & , h[], {args}] > > OrderedUnion[4][li_] := > Module[{tmp = Transpose[{li, Range[Length[li]]}]}, > tmp = Union[tmp, SameTest -> (#1[[1]] === #2[[1]] &)]; > Last /@ Sort[RotateLeft /@ tmp]] > > First a long list with a small number of distinnct elements: > > In[5]:= > l1 = Table[Random[Integer, {1, 10}], {10000}]; > > In[6]:= > Table[OrderedUnion[i][l1]; // Timing, {i, 1, 4}] > Out[6]= > {{0.183333 Second, Null}, {0.0333333 Second, Null}, > > {0.75 Second, Null}, {0.533333 Second, Null}} > > Next a long list with quite many distinct elements: > > l2 = Table[Random[Integer, {1, 1000}], {10000}]; > > In[8]:= > Table[OrderedUnion[i][l2]; // Timing, {i, 1, 4}] > Out[8]= > {{0.3 Second, Null}, {1.05 Second, Null}, > > {8.48333 Second, Null}, {22.0333 Second, Null}} > > -- > Andrzej Kozlowski > Toyama International University > JAPAN > http://sigma.tuins.ac.jp > http://eri2.tuins.ac.jp > > ---------- > >From: "Ersek, Ted R" <ErsekTR at navair.navy.mil> > To: mathgroup at smc.vnet.net > >To: mathgroup at smc.vnet.net > >Subject: [mg18589] [mg18556] [mg18525] RE: [mg18428] Re: [mg18344] Re: [mg18308] easiest way to > sort a list? > >Date: Fri, Jul 9, 1999, 11:32 AM > > > > > I have a new approach to add to this long thread. > > > > In[1]:= > > OrderedUnion5[h_[args__]]:= > > Fold[ If[ FreeQ[#1,#2,1], Join[#1,h[#2]], #1]& ,h[],{args}] > > > > In[2]:= > > OrderedUnion5[{5,3,5,3,1,2,2,3,4,5,3}] > > > > Out[2]= > > {5,3,1,2,4} > > > > > > I would like to see how the timings of different methods compare (version > > above included). I would do it, but I am working under Windows 98, and I > > understand Microsoft made it impossible for Timing to give accurate results. > > > > Regards, > > Ted Ersek > > > > -------------------- > > Hi Hartmut (and newsgroup), > > > > Thanks for unraveling the linear nature of the Block[{i... function. Nice > > explanation! > > > > Since this topic shows no sign of slowing down, I thought I would present a > > couple refined versions of my two functions. First, an improvement of the > > "linear" solution: > > > > OrderedUnion[li_]:=Block[{i,Sequence}, > > i[n_]:=(i[n]=Sequence[];n); > > i/@li] > > > > By including Sequence in the Block variables, Mathematica doesn't waste time > > eliminating Sequences until after the Block is exited. This seems to be a > > 5-10% or so improvement in speed. > > > > Secondly, an improvement of my O(n log n) solution: > > > > OrderedUnion[3][li_]:=Module[{tmp=Transpose[{li,Range[Length[li]]}]}, > > tmp=Union[tmp, SameTest->(#1[[1]]===#2[[1]]&)]; > > Last/@Sort[RotateLeft/@tmp]] > > > > The key idea here is that Sort with the default ordering function is much > > faster than Sort with a user supplied ordering function. So, RotateLeft > > switches the order of the index and the element to allow Sort to be used > > with its default ordering function. For lists with a large number of > > distinct elements, this seems to improve the speed by a factor of 3. > > > > Carl Woll > > Dept of Physics > > U of Washington > > > > Wolf, Hartmut wrote: > > > >> Hello everone, > >> > >> I just sent a letter to Carl K.Woll, part of which I would like to > >> publish to the community: > >> > >> Hello Carl, > >> > >> congratulations for your brilliant solution of "easiest way to sort a > >> list?" At the first glance, ... I was already struck. > >> > >> According to my measurements, you proposed not only the fastest > >> solution, but also the second fastest, the fastest O[n log n] version. > >> .... > >> > >> The question that concerned me was, is your solution really O[n]? My > >> preoccupation before was that O[n log n] could be the best achievable. > >> Clearly your solution is O[n] for 'smaller' samples, but the question > >> was, is it still O[n] for very large samples? So I did some > >> measurements, which I'd like to show you. They are in the two attached > >> notebooks (...not for you, unless you request it from > >> mailto:hwolf at debis.com ) > >> > >> What is the interpretation of the graphs? In fact your solution is as > >> far as I could calculate seemingly linear overall. There are interesting > >> wiggles in it. The algorithm seems to become increasingly worse, when it > >> makes up it's mind and "starts anew". Well I think this are the traces > >> of Mathematica's Extendible Hashing algorithm for the symbol store. When > >> hashing becomes more and more inefficient (due to collisions) then the > >> store ist extended. (This conforms to my observation of memory usage > >> with Windows NT Task Manager during the calculation, where I say periods > >> of rapid increase of used memory and times of slower grow.) So it's this > >> extension, which keeps your algorithm linear. > >> > >> .... > >> > >> With kind regards, your > >> Hartmut Wolf > >

**References**:**Re: RE: Re: Re: easiest way to sort a list?***From:*"Andrzej Kozlowski" <andrzej@tuins.ac.jp>