MathGroup Archive 1999

[Date Index] [Thread Index] [Author Index]

Search the Archive

Re(2): Re: easiest way to sort a list?

  • To: mathgroup at smc.vnet.net
  • Subject: [mg18405] Re(2): [mg18322] Re: [mg18308] easiest way to sort a list?
  • From: Andrzej Kozlowski <andrzej at tuins.ac.jp>
  • Date: Wed, 7 Jul 1999 00:11:04 -0400
  • Sender: owner-wri-mathgroup at wolfram.com

Of course I agree with all your comments except perhaps one point. I do
not know how Union works internally but  it seems that could be
implemented as follows: first remove all repetitions (e.g. by a method
similar to Carl Woll's) and then sorts the remaining, possibly  much
smaller list. Now, if it did that, and if the total number of distinct
elements to sort was small than the final sort would not make a
significant contribution to the total time taken. Union being a compiled
optimized function would then always enjoy a big advantage over any user
defined OrderedUnion that would never be cancelled by the extra time
needed to perform the final sort of only a few elements (after removing
repetitions). 
Of course this depends on the assumption that Union really only sorts
elements after removing repetitions, and that we are keeping the number
of distinct elements small while making our lists longer. I do not see
any reason why the user defined function would ever have to catch up in
this case.

If I am right than this is an interesting illustration of the difference
between Mathematica programming and theoretical computer science. The
existence of compiled built-in functions and various overheads means that
theoretical estimates of efficiency of algorithms may not only fail to
correspond to what actually happens in real life when programs have only
a finite time to run but might even be incorrect asymptotically. In fact
I am just about write another message, concerning the built in function
Split which also seems to me to illustrate the same point from a
different angle.


On Wed, Jun 30, 1999, Daniel Lichtblau <danl at wolfram.com> wrote:



>Andrzej Kozlowski wrote:
>> 
>> This is very efficient indeed and beutifully illustrates the usefullnes of
>> the amazing Sequence function. The only slight weakness that I can see is
>> when dealing with very long lists with a small number of distinct elements.
>> Since we know that the final output must have the same number of
elements as
>> given by the (very fast) built-in Union function, for such lists one may do
>> better by a code which stops checking elements of li once it has
constructed
>> a list of length Length[Union[li]]. Here is a modified code that does this:
>> 
>> In[2]:=
>> OrderedUnion1[li_]:=Block[{i,counter=0,l0={},m=Length[Union[li]]},
>>     i[n_]:=(i[n]=Sequence[];counter=counter+1;n);
>>    Scan[If[counter<m,l0={l0,i[#]},Return[Flatten[l0]]]&,li];Flatten[l0]]
>> 
>> On very long lists with a small number of distinct elements this is
somewhat
>> faster than OrderedUnion:
>> 
>> In[3]:=
>> list1=Table[Random[Integer,{1,9}],{10000}];
>> 
>> In[4]:=
>> OrderedUnion[list1];//Timing
>> Out[4]=
>> {0.166667 Second, Null}
>> 
>> In[5]:=
>> OrderedUnion1[list1];//Timing
>> Out[5]=
>> {0.0833333 Second, Null}
>> 
>> (Mac PowerBook G3 233 Mhz).
>> 
>> However, in the case of lists with very few or no repetitions the
balance is
>> naturally reversed:
>> 
>> In[6]:=
>> list2=Range[1000];
>> 
>> In[7]:=
>> OrderedUnion[list2];//Timing
>> Out[7]=
>> {0.15 Second, Null}
>> 
>> In[8]:=
>> OrderedUnion1[list2];//Timing
>> Out[8]=
>> {0.25 Second, Null}
>
>Use of Union or Sort will make the asymptotic complexity worse
>regardless of size of result, but the crossover may be so large when
>dealing with lists of machine integers or machine reals that it hardly
>matters, at least in version 4. I'd not be surprised if it is on the
>order of a few million.
>
>
>> What's worse, on my Mac if I try to apply OrderedUnion1 to
>> list2=Range[10000] I get  a crash, caused presumably by the Kernel running
>> out of memory while building up a huge nested list.
>
>This problem was brought to my attention by Colin Rose. I need to look
>into it.
>
>
>> For those with a taste for weird Mathematica programs here is an unusual
>> (and unfortunately inefficient) example:
>> 
>> In[10]:=
>> OrderedUnion2[l_]:=Module[{a},Reverse[GroebnerBasis[Map[a,l]]]/.a[k_]->k]
>> 
>> e.g.
>> 
>> In[11]:=
>> OrderedUnion2[{3,2,4,5,a,a,3,6,5,2,8}]
>> Out[11]=
>> {3, 2, 4, 5, a, 6, 8}
>
>This relies on the fact that GroebnerBasis orders its variables as they
>appear in the input. Not really safe to rely on this as it is not
>documented behavior. Specifically, default ordering is unspecified.
>
>
>> --
>> Andrzej Kozlowski
>> Toyama International University
>> JAPAN
>> http://sigma.tuins.ac.jp
>> http://eri2.tuins.ac.jp
>> 
>
>
>Daniel Lichtblau
>Wolfram Research
>
>PS I agree Carl Woll's method is really nice. I played around with the
>problem before I saw his, also had an O(n+m) method where n = size in
>and m = size out. His method gave timings consistently around 2/3 of
>mine on large problems.
>


Andrzej Kozlowski
Toyama International University
JAPAN
http://sigma.tuins.ac.jp/
http://eri2.tuins.ac.jp/



  • Prev by Date: Converting notebook to pdf ?
  • Next by Date: Re: Re: Re: easiest way to sort a list?
  • Previous by thread: Re: Converting notebook to pdf ?
  • Next by thread: Contour Integrals