Re: Re: Re: RE: Re: Re: easiest way to sort a list?

*To*: mathgroup at smc.vnet.net*Subject*: [mg18778] Re: [mg18722] Re: [mg18556] Re: [mg18525] RE: [mg18428] Re: [mg18344] Re: [mg18308] easiest way to sort a list?*From*: "Andrzej Kozlowski" <andrzej at lineone.net>*Date*: Tue, 20 Jul 1999 01:33:33 -0400*Sender*: owner-wri-mathgroup at wolfram.com

This looked like a possible explanation until I run my tests again in reverse order (the definitions of Ordered[i] are as in the original message) In[6]:= l= Table[Random[Integer, {1, 5000}], {10000}]; In[7]:= Table[OrderedUnion[i][l]; // Timing, {i, 5, 1, -1}] Out[7]= {{5.06667 Second, Null}, {392.217 Second, Null}, {37.75 Second, Null}, {2.01667 Second, Null}, {1.06667 Second, Null}} So the very different performance of OrderedUnion[4] and OrderedUnion[2] on our machines must probably be accounted for by some differences between Mathematica 3 and 4. OrderedUnion[2] seems to run buch better under version 4 without crashes or slow downs, but OrderedUnion[2] (at least on the Mac) runs dreadfully slowly. -- Andrzej Kozlowski Toyama International University JAPAN http://sigma.tuins.ac.jp http://eri2.tuins.ac.jp ---------- >From: "Michael J. Sharpe" <msharpe at ucsd.edu> To: mathgroup at smc.vnet.net >To: mathgroup at smc.vnet.net >Subject: [mg18778] [mg18722] Re: [mg18556] Re: [mg18525] RE: [mg18428] Re: [mg18344] Re: [mg18308] easiest way to sort a list? >Date: Sat, Jul 17, 1999, 7:36 AM > > I'm using Mathematica 3.0 on a G3 266. If I run just OrderedUnion[4] on > Table[Random[Integer, {1, 5000}], {10000}], I get times comparable for those with > OrderedUnion[1]. However, when I run OrderedUnion[2] on this particular data, the > kernel invariably freezes the machine or crashes it with a "bus error". On > smaller data sets, if it doesn't crash the machine, it slows subsequent > operations dramatically. Perhaps the anomolous timing you observed with > OrderedUnion[4] in Mathematica 4.0 is caused by some similar effect. > > I believe the problem can be laid at the feet of recursion depth in the > construction of l0 in OrderedUnion[2]. After running the following similar code, > my machine as soon as I try to access li, say by asking for li[[1]]. > > li={}; > For[k=1,k<=5000,k++,li={li,k}]; > Flatten[li]; > > Michael Sharpe > > Andrzej Kozlowski wrote: > >> Actually, OrderedUnion[4] is dreadfully slow on my Mac for this last >> list. Here is what I get: >> >> In[16]:= >> Table[OrderedUnion[i][l3]; // Timing, {i, 1, 4}] >> Out[16]= >> {{1.16667 Second, Null}, {2.15 Second, Null}, {38.5833 Second, >> Null}, {396.433 Second, Null}} >> >> And yet Sort and Union are indeed in version 4. You can see this when I >> add Colin Rose's function to the test list: >> >> OrderedUnion[5][lis_] := >> >> Part[lis, Sort[Map[Position[lis, #][[1]] &, Union[lis]] // Flatten]] >> >> Running all five in a fresh Mathematica 4.0 session I get: >> >> In[6]:= >> l = Table[Random[Integer, {1, 5000}], {10000}]; >> >> In[7]:= >> Table[OrderedUnion[i][l]; // Timing, {i, 1, 5}] >> Out[7]= >> {{1.16667 Second, Null}, {2.06667 Second, Null}, {36.3 Second, >> Null}, {403.417 Second, Null}, {4.93333 Second, Null}} >> >> with OrderedUnion[4] doing slightly worse than before. For a list with >> only a few digits I get: >> >> In[8]:= >> l1 = Table[Random[Integer, {1, 50}], {10000}]; >> >> In[9]:= >> Table[OrderedUnion[i][l1]; // Timing, {i, 1, 5}] >> Out[9]= >> {{0.166667 Second, Null}, {0.05 Second, Null}, {1.01667 Second, >> Null}, {0.566667 Second, Null}, {1.16667 Second, Null}} >> >> So on my machine it is only for such types of lists that OrderedUnion[4] >> does relatively well. There may be something like the Split problem >> involved here, but I do not think it's wrth the time and effort it would >> take to investigate this. I am now inclined to think that cross-platform >> timings in Mathematica are almost meaningless. >> >> On Sat, Jul 10, 1999, Carl K.Woll <carlw at fermi.phys.washington.edu> wrote: >> >> >Andrzej, >> > >> >Your results are very different from those I get with Mathematica V3.0 >> on a 100 >> >MHz PC >> >with Windows 95. For your l2 list, I got >> > >> >{{1.93 Second, Null}, {6.04 Second, Null}, >> > {43.12 Second, Null}, {5.65 Second, Null}} >> > >> >If I try using the following list >> > >> >Table[Random[Integer, {1, 5000}], {10000}]; >> > >> >I get the following timings >> > >> >{{7.47 Second, Null}, {14.4 Second, Null}, >> > {177.24 Second, Null}, {6.98 Second, Null}} >> > >> >So, for some reason OrderedUnion[4] is much slower on your machine than it is >> >on mine. >> >That is, for the list l2, OrderedUnion[4] is 66 times slower than >> >OrderedUnion[1] on your >> >machine, but it is only 3 times slower on my machine. This is odd, since >> >OrderedUnion[4] >> >uses Sort and Union, and I thought Colin Rose suggested that Sort and >> Union are >> >much >> >faster in Version 4.0 than in Version 3.0. Hence, I would have thought that >> >OrderedUnion[4] would have worked even better on your machine. >> > >> >Carl Woll >> >Dept of Physics >> >U of Washington >> > >> >Andrzej Kozlowski wrote: >> > >> >> Here are some more results on my PowerBook G3 (233 mghz, Mathematica 4.0): >> >> >> >> In[1]:= >> >> OrderedUnion[1][li_] := >> >> Block[{i, Sequence}, >> >> i[n_] := (i[n] = Sequence[]; n); >> >> i /@ li] >> >> >> >> OrderedUnion[2][li_] := >> >> Block[{i, counter = 0, l0 = {}, m = Length[Union[li]], Sequence}, >> >> i[n_] := (i[n] = Sequence[]; counter = counter + 1; n); >> >> Scan[If[counter < m, l0 = {l0, i[#]}, Return[Flatten[l0]]] &, li]; >> >> Flatten[l0]] >> >> >> >> OrderedUnion[3][h_[args__]] := >> >> Fold[ If[ FreeQ[#1, #2, 1], Join[#1, h[#2]], #1] & , h[], {args}] >> >> >> >> OrderedUnion[4][li_] := >> >> Module[{tmp = Transpose[{li, Range[Length[li]]}]}, >> >> tmp = Union[tmp, SameTest -> (#1[[1]] === #2[[1]] &)]; >> >> Last /@ Sort[RotateLeft /@ tmp]] >> >> >> >> First a long list with a small number of distinnct elements: >> >> >> >> In[5]:= >> >> l1 = Table[Random[Integer, {1, 10}], {10000}]; >> >> >> >> In[6]:= >> >> Table[OrderedUnion[i][l1]; // Timing, {i, 1, 4}] >> >> Out[6]= >> >> {{0.183333 Second, Null}, {0.0333333 Second, Null}, >> >> >> >> {0.75 Second, Null}, {0.533333 Second, Null}} >> >> >> >> Next a long list with quite many distinct elements: >> >> >> >> l2 = Table[Random[Integer, {1, 1000}], {10000}]; >> >> >> >> In[8]:= >> >> Table[OrderedUnion[i][l2]; // Timing, {i, 1, 4}] >> >> Out[8]= >> >> {{0.3 Second, Null}, {1.05 Second, Null}, >> >> >> >> {8.48333 Second, Null}, {22.0333 Second, Null}} >> >> >> >> -- >> >> Andrzej Kozlowski >> >> Toyama International University >> >> JAPAN >> >> http://sigma.tuins.ac.jp >> >> http://eri2.tuins.ac.jp >> >> >> >> ---------- >> >> >From: "Ersek, Ted R" <ErsekTR at navair.navy.mil> To: mathgroup at smc.vnet.net > To: mathgroup at smc.vnet.net >> >> To: mathgroup at smc.vnet.net >> >> >To: mathgroup at smc.vnet.net >> >> >Subject: [mg18778] [mg18722] [mg18556] [mg18525] RE: [mg18428] Re: [mg18344] > Re: [mg18308] >> >easiest way to >> >> sort a list? >> >> >Date: Fri, Jul 9, 1999, 11:32 AM >> >> > >> >> >> >> > I have a new approach to add to this long thread. >> >> > >> >> > In[1]:= >> >> > OrderedUnion5[h_[args__]]:= >> >> > Fold[ If[ FreeQ[#1,#2,1], Join[#1,h[#2]], #1]& ,h[],{args}] >> >> > >> >> > In[2]:= >> >> > OrderedUnion5[{5,3,5,3,1,2,2,3,4,5,3}] >> >> > >> >> > Out[2]= >> >> > {5,3,1,2,4} >> >> > >> >> > >> >> > I would like to see how the timings of different methods compare (version >> >> > above included). I would do it, but I am working under Windows 98, and I >> >> > understand Microsoft made it impossible for Timing to give accurate >> >results. >> >> > >> >> > Regards, >> >> > Ted Ersek >> >> > >> >> > -------------------- >> >> > Hi Hartmut (and newsgroup), >> >> > >> >> > Thanks for unraveling the linear nature of the Block[{i... function. Nice >> >> > explanation! >> >> > >> >> > Since this topic shows no sign of slowing down, I thought I would present >> >a >> >> > couple refined versions of my two functions. First, an improvement of the >> >> > "linear" solution: >> >> > >> >> > OrderedUnion[li_]:=Block[{i,Sequence}, >> >> > i[n_]:=(i[n]=Sequence[];n); >> >> > i/@li] >> >> > >> >> > By including Sequence in the Block variables, Mathematica doesn't waste >> >time >> >> > eliminating Sequences until after the Block is exited. This seems to be a >> >> > 5-10% or so improvement in speed. >> >> > >> >> > Secondly, an improvement of my O(n log n) solution: >> >> > >> >> > OrderedUnion[3][li_]:=Module[{tmp=Transpose[{li,Range[Length[li]]}]}, >> >> > tmp=Union[tmp, SameTest->(#1[[1]]===#2[[1]]&)]; >> >> > Last/@Sort[RotateLeft/@tmp]] >> >> > >> >> > The key idea here is that Sort with the default ordering function is much >> >> > faster than Sort with a user supplied ordering function. So, RotateLeft >> >> > switches the order of the index and the element to allow Sort to be used >> >> > with its default ordering function. For lists with a large number of >> >> > distinct elements, this seems to improve the speed by a factor of 3. >> >> > >> >> > Carl Woll >> >> > Dept of Physics >> >> > U of Washington >> >> > >> >> > Wolf, Hartmut wrote: >> >> > >> >> >> Hello everone, >> >> >> >> >> >> I just sent a letter to Carl K.Woll, part of which I would like to >> >> >> publish to the community: >> >> >> >> >> >> Hello Carl, >> >> >> >> >> >> congratulations for your brilliant solution of "easiest way to sort a >> >> >> list?" At the first glance, ... I was already struck. >> >> >> >> >> >> According to my measurements, you proposed not only the fastest >> >> >> solution, but also the second fastest, the fastest O[n log n] version. >> >> >> .... >> >> >> >> >> >> The question that concerned me was, is your solution really O[n]? My >> >> >> preoccupation before was that O[n log n] could be the best achievable. >> >> >> Clearly your solution is O[n] for 'smaller' samples, but the question >> >> >> was, is it still O[n] for very large samples? So I did some >> >> >> measurements, which I'd like to show you. They are in the two attached >> >> >> notebooks (...not for you, unless you request it from >> >> >> mailto:hwolf at debis.com ) >> >> >> >> >> >> What is the interpretation of the graphs? In fact your solution is as >> >> >> far as I could calculate seemingly linear overall. There are interesting >> >> >> wiggles in it. The algorithm seems to become increasingly worse, when it >> >> >> makes up it's mind and "starts anew". Well I think this are the traces >> >> >> of Mathematica's Extendible Hashing algorithm for the symbol store. When >> >> >> hashing becomes more and more inefficient (due to collisions) then the >> >> >> store ist extended. (This conforms to my observation of memory usage >> >> >> with Windows NT Task Manager during the calculation, where I say periods >> >> >> of rapid increase of used memory and times of slower grow.) So it's this >> >> >> extension, which keeps your algorithm linear. >> >> >> >> >> >> .... >> >> >> >> >> >> With kind regards, your >> >> >> Hartmut Wolf >> >> > >> > >> > >> > >> > >> > >> >> Andrzej Kozlowski >> Toyama International University >> JAPAN >> http://sigma.tuins.ac.jp/ >> http://eri2.tuins.ac.jp/ >