Re(2): Re: easiest way to sort a list?

*To*: mathgroup at smc.vnet.net*Subject*: [mg18405] Re(2): [mg18322] Re: [mg18308] easiest way to sort a list?*From*: Andrzej Kozlowski <andrzej at tuins.ac.jp>*Date*: Wed, 7 Jul 1999 00:11:04 -0400*Sender*: owner-wri-mathgroup at wolfram.com

Of course I agree with all your comments except perhaps one point. I do not know how Union works internally but it seems that could be implemented as follows: first remove all repetitions (e.g. by a method similar to Carl Woll's) and then sorts the remaining, possibly much smaller list. Now, if it did that, and if the total number of distinct elements to sort was small than the final sort would not make a significant contribution to the total time taken. Union being a compiled optimized function would then always enjoy a big advantage over any user defined OrderedUnion that would never be cancelled by the extra time needed to perform the final sort of only a few elements (after removing repetitions). Of course this depends on the assumption that Union really only sorts elements after removing repetitions, and that we are keeping the number of distinct elements small while making our lists longer. I do not see any reason why the user defined function would ever have to catch up in this case. If I am right than this is an interesting illustration of the difference between Mathematica programming and theoretical computer science. The existence of compiled built-in functions and various overheads means that theoretical estimates of efficiency of algorithms may not only fail to correspond to what actually happens in real life when programs have only a finite time to run but might even be incorrect asymptotically. In fact I am just about write another message, concerning the built in function Split which also seems to me to illustrate the same point from a different angle. On Wed, Jun 30, 1999, Daniel Lichtblau <danl at wolfram.com> wrote: >Andrzej Kozlowski wrote: >> >> This is very efficient indeed and beutifully illustrates the usefullnes of >> the amazing Sequence function. The only slight weakness that I can see is >> when dealing with very long lists with a small number of distinct elements. >> Since we know that the final output must have the same number of elements as >> given by the (very fast) built-in Union function, for such lists one may do >> better by a code which stops checking elements of li once it has constructed >> a list of length Length[Union[li]]. Here is a modified code that does this: >> >> In[2]:= >> OrderedUnion1[li_]:=Block[{i,counter=0,l0={},m=Length[Union[li]]}, >> i[n_]:=(i[n]=Sequence[];counter=counter+1;n); >> Scan[If[counter<m,l0={l0,i[#]},Return[Flatten[l0]]]&,li];Flatten[l0]] >> >> On very long lists with a small number of distinct elements this is somewhat >> faster than OrderedUnion: >> >> In[3]:= >> list1=Table[Random[Integer,{1,9}],{10000}]; >> >> In[4]:= >> OrderedUnion[list1];//Timing >> Out[4]= >> {0.166667 Second, Null} >> >> In[5]:= >> OrderedUnion1[list1];//Timing >> Out[5]= >> {0.0833333 Second, Null} >> >> (Mac PowerBook G3 233 Mhz). >> >> However, in the case of lists with very few or no repetitions the balance is >> naturally reversed: >> >> In[6]:= >> list2=Range[1000]; >> >> In[7]:= >> OrderedUnion[list2];//Timing >> Out[7]= >> {0.15 Second, Null} >> >> In[8]:= >> OrderedUnion1[list2];//Timing >> Out[8]= >> {0.25 Second, Null} > >Use of Union or Sort will make the asymptotic complexity worse >regardless of size of result, but the crossover may be so large when >dealing with lists of machine integers or machine reals that it hardly >matters, at least in version 4. I'd not be surprised if it is on the >order of a few million. > > >> What's worse, on my Mac if I try to apply OrderedUnion1 to >> list2=Range[10000] I get a crash, caused presumably by the Kernel running >> out of memory while building up a huge nested list. > >This problem was brought to my attention by Colin Rose. I need to look >into it. > > >> For those with a taste for weird Mathematica programs here is an unusual >> (and unfortunately inefficient) example: >> >> In[10]:= >> OrderedUnion2[l_]:=Module[{a},Reverse[GroebnerBasis[Map[a,l]]]/.a[k_]->k] >> >> e.g. >> >> In[11]:= >> OrderedUnion2[{3,2,4,5,a,a,3,6,5,2,8}] >> Out[11]= >> {3, 2, 4, 5, a, 6, 8} > >This relies on the fact that GroebnerBasis orders its variables as they >appear in the input. Not really safe to rely on this as it is not >documented behavior. Specifically, default ordering is unspecified. > > >> -- >> Andrzej Kozlowski >> Toyama International University >> JAPAN >> http://sigma.tuins.ac.jp >> http://eri2.tuins.ac.jp >> > > >Daniel Lichtblau >Wolfram Research > >PS I agree Carl Woll's method is really nice. I played around with the >problem before I saw his, also had an O(n+m) method where n = size in >and m = size out. His method gave timings consistently around 2/3 of >mine on large problems. > Andrzej Kozlowski Toyama International University JAPAN http://sigma.tuins.ac.jp/ http://eri2.tuins.ac.jp/