Mathematica 9 is now available
Services & Resources / Wolfram Forums
-----
 /
MathGroup Archive
1999
*January
*February
*March
*April
*May
*June
*July
*August
*September
*October
*November
*December
*Archive Index
*Ask about this page
*Print this page
*Give us feedback
*Sign up for the Wolfram Insider

MathGroup Archive 1999

[Date Index] [Thread Index] [Author Index]

Search the Archive

Re: Re: Re: RE: Re: Re: easiest way to sort a list?

  • To: mathgroup at smc.vnet.net
  • Subject: [mg18778] Re: [mg18722] Re: [mg18556] Re: [mg18525] RE: [mg18428] Re: [mg18344] Re: [mg18308] easiest way to sort a list?
  • From: "Andrzej Kozlowski" <andrzej at lineone.net>
  • Date: Tue, 20 Jul 1999 01:33:33 -0400
  • Sender: owner-wri-mathgroup at wolfram.com

This looked like a possible explanation until I run my tests again in 
reverse order (the definitions of Ordered[i] are as in the original message)


In[6]:=
l= Table[Random[Integer, {1, 5000}], {10000}];
In[7]:=
Table[OrderedUnion[i][l]; // Timing, {i, 5, 1, -1}]
Out[7]=
{{5.06667 Second, Null}, {392.217 Second, Null}, {37.75 Second,
    Null}, {2.01667 Second, Null}, {1.06667 Second, Null}}

So the very different performance of OrderedUnion[4] and OrderedUnion[2] on
our machines must probably be accounted for by some differences between
Mathematica 3 and 4. OrderedUnion[2] seems to run buch better under version
4 without crashes or slow downs, but OrderedUnion[2] (at least on the Mac)
runs dreadfully slowly.
--
Andrzej Kozlowski
Toyama International University
JAPAN
http://sigma.tuins.ac.jp
http://eri2.tuins.ac.jp


----------
>From: "Michael J. Sharpe" <msharpe at ucsd.edu>
To: mathgroup at smc.vnet.net
>To: mathgroup at smc.vnet.net
>Subject: [mg18778] [mg18722] Re: [mg18556] Re: [mg18525] RE: [mg18428] Re: [mg18344] Re:
[mg18308] easiest way to sort a list?
>Date: Sat, Jul 17, 1999, 7:36 AM
>

> I'm using Mathematica 3.0 on a G3 266. If I run just OrderedUnion[4] on
> Table[Random[Integer, {1, 5000}], {10000}], I get times comparable for those
with
> OrderedUnion[1]. However, when I run OrderedUnion[2] on this particular data,
the
> kernel invariably freezes the machine or crashes it with a "bus error". On
> smaller data sets, if it doesn't crash the machine, it slows subsequent
> operations dramatically. Perhaps the anomolous timing you observed with
> OrderedUnion[4] in Mathematica 4.0 is caused by some similar effect.
>
> I believe the problem can be laid at the feet of recursion depth in the
> construction of l0 in OrderedUnion[2]. After running the following similar
code,
> my machine as soon as I try to access li, say by asking for li[[1]].
>
> li={};
> For[k=1,k<=5000,k++,li={li,k}];
> Flatten[li];
>
> Michael Sharpe
>
> Andrzej Kozlowski wrote:
>
>> Actually, OrderedUnion[4] is dreadfully slow on my Mac for this last
>> list. Here is what I get:
>>
>> In[16]:=
>> Table[OrderedUnion[i][l3]; // Timing, {i, 1, 4}]
>> Out[16]=
>> {{1.16667 Second, Null}, {2.15 Second, Null}, {38.5833 Second,
>>     Null}, {396.433 Second, Null}}
>>
>> And yet Sort and Union are indeed in version 4. You can see this when I
>> add Colin Rose's function to the test list:
>>
>> OrderedUnion[5][lis_] :=
>>
>>   Part[lis, Sort[Map[Position[lis, #][[1]] &, Union[lis]] // Flatten]]
>>
>> Running all five in a fresh Mathematica 4.0 session I get:
>>
>> In[6]:=
>> l = Table[Random[Integer, {1, 5000}], {10000}];
>>
>> In[7]:=
>> Table[OrderedUnion[i][l]; // Timing, {i, 1, 5}]
>> Out[7]=
>> {{1.16667 Second, Null}, {2.06667 Second, Null}, {36.3 Second,
>>     Null}, {403.417 Second, Null}, {4.93333 Second, Null}}
>>
>> with OrderedUnion[4] doing slightly worse than before. For a list with
>> only a few digits I get:
>>
>> In[8]:=
>> l1 = Table[Random[Integer, {1, 50}], {10000}];
>>
>> In[9]:=
>> Table[OrderedUnion[i][l1]; // Timing, {i, 1, 5}]
>> Out[9]=
>> {{0.166667 Second, Null}, {0.05 Second, Null}, {1.01667 Second,
>>     Null}, {0.566667 Second, Null}, {1.16667 Second, Null}}
>>
>> So on my machine it is only for such types of lists that OrderedUnion[4]
>> does relatively well. There may be something like the Split problem
>> involved here, but I do not think it's wrth the time and effort it would
>> take to investigate this. I am now inclined to think that cross-platform
>> timings in Mathematica are almost meaningless.
>>
>> On Sat, Jul 10, 1999, Carl K.Woll <carlw at fermi.phys.washington.edu> wrote:
>>
>> >Andrzej,
>> >
>> >Your results are very different from those I get with Mathematica V3.0
>> on a 100
>> >MHz PC
>> >with Windows 95. For your l2 list, I got
>> >
>> >{{1.93 Second, Null}, {6.04 Second, Null},
>> > {43.12 Second, Null}, {5.65 Second, Null}}
>> >
>> >If I try using the following list
>> >
>> >Table[Random[Integer, {1, 5000}], {10000}];
>> >
>> >I get the following timings
>> >
>> >{{7.47 Second, Null}, {14.4 Second, Null},
>> > {177.24 Second, Null}, {6.98 Second, Null}}
>> >
>> >So, for some reason OrderedUnion[4] is much slower on your machine than it
is
>> >on mine.
>> >That is, for the list l2, OrderedUnion[4] is 66 times slower than
>> >OrderedUnion[1] on your
>> >machine, but it is only 3 times slower on my machine. This is odd, since
>> >OrderedUnion[4]
>> >uses Sort and Union, and I thought Colin Rose suggested that Sort and
>> Union are
>> >much
>> >faster in Version 4.0 than in Version 3.0. Hence, I would have thought that
>> >OrderedUnion[4] would have worked even better on your machine.
>> >
>> >Carl Woll
>> >Dept of Physics
>> >U of Washington
>> >
>> >Andrzej Kozlowski wrote:
>> >
>> >> Here are some more results on my PowerBook G3 (233 mghz, Mathematica 4.0):
>> >>
>> >> In[1]:=
>> >> OrderedUnion[1][li_] :=
>> >>   Block[{i, Sequence},
>> >>                   i[n_] := (i[n] = Sequence[]; n);
>> >>                   i /@ li]
>> >>
>> >> OrderedUnion[2][li_] :=
>> >>   Block[{i, counter = 0, l0 = {}, m = Length[Union[li]], Sequence},
>> >>     i[n_] := (i[n] = Sequence[]; counter = counter + 1; n);
>> >>    Scan[If[counter < m, l0 = {l0, i[#]}, Return[Flatten[l0]]] &, li];
>> >>     Flatten[l0]]
>> >>
>> >> OrderedUnion[3][h_[args__]] :=
>> >>   Fold[ If[ FreeQ[#1, #2, 1], Join[#1, h[#2]], #1] & , h[], {args}]
>> >>
>> >> OrderedUnion[4][li_] :=
>> >>   Module[{tmp = Transpose[{li, Range[Length[li]]}]},
>> >>     tmp = Union[tmp, SameTest -> (#1[[1]] === #2[[1]] &)];
>> >>     Last /@ Sort[RotateLeft /@ tmp]]
>> >>
>> >> First a long list with a small number of distinnct elements:
>> >>
>> >> In[5]:=
>> >> l1 = Table[Random[Integer, {1, 10}], {10000}];
>> >>
>> >> In[6]:=
>> >> Table[OrderedUnion[i][l1]; // Timing, {i, 1, 4}]
>> >> Out[6]=
>> >> {{0.183333 Second, Null}, {0.0333333 Second, Null},
>> >>
>> >>   {0.75 Second, Null}, {0.533333 Second, Null}}
>> >>
>> >> Next a long list with quite many distinct elements:
>> >>
>> >> l2 = Table[Random[Integer, {1, 1000}], {10000}];
>> >>
>> >> In[8]:=
>> >> Table[OrderedUnion[i][l2]; // Timing, {i, 1, 4}]
>> >> Out[8]=
>> >> {{0.3 Second, Null}, {1.05 Second, Null},
>> >>
>> >>   {8.48333 Second, Null}, {22.0333 Second, Null}}
>> >>
>> >> --
>> >> Andrzej Kozlowski
>> >> Toyama International University
>> >> JAPAN
>> >> http://sigma.tuins.ac.jp
>> >> http://eri2.tuins.ac.jp
>> >>
>> >> ----------
>> >> >From: "Ersek, Ted R" <ErsekTR at navair.navy.mil>
To: mathgroup at smc.vnet.net
> To: mathgroup at smc.vnet.net
>> >> To: mathgroup at smc.vnet.net
>> >> >To: mathgroup at smc.vnet.net
>> >> >Subject: [mg18778] [mg18722] [mg18556] [mg18525] RE: [mg18428] Re: [mg18344]
> Re: [mg18308]
>> >easiest way to
>> >> sort a list?
>> >> >Date: Fri, Jul 9, 1999, 11:32 AM
>> >> >
>> >>
>> >> > I have a new approach to add to this long thread.
>> >> >
>> >> > In[1]:=
>> >> > OrderedUnion5[h_[args__]]:=
>> >> >   Fold[ If[ FreeQ[#1,#2,1], Join[#1,h[#2]], #1]& ,h[],{args}]
>> >> >
>> >> > In[2]:=
>> >> > OrderedUnion5[{5,3,5,3,1,2,2,3,4,5,3}]
>> >> >
>> >> > Out[2]=
>> >> > {5,3,1,2,4}
>> >> >
>> >> >
>> >> > I would like to see how the timings of different methods compare
(version
>> >> > above included).  I would do it, but I am working under Windows 98, and
I
>> >> > understand Microsoft made it impossible for Timing to give accurate
>> >results.
>> >> >
>> >> > Regards,
>> >> > Ted Ersek
>> >> >
>> >> > --------------------
>> >> > Hi Hartmut (and newsgroup),
>> >> >
>> >> > Thanks for unraveling the linear nature of the Block[{i... function.
Nice
>> >> > explanation!
>> >> >
>> >> > Since this topic shows no sign of slowing down, I thought I would
present
>> >a
>> >> > couple refined versions of my two functions. First, an improvement of
the
>> >> > "linear" solution:
>> >> >
>> >> > OrderedUnion[li_]:=Block[{i,Sequence},
>> >> >     i[n_]:=(i[n]=Sequence[];n);
>> >> >     i/@li]
>> >> >
>> >> > By including Sequence in the Block variables, Mathematica doesn't waste
>> >time
>> >> > eliminating Sequences until after the Block is exited. This seems to be
a
>> >> > 5-10% or so improvement in speed.
>> >> >
>> >> > Secondly, an improvement of my O(n log n) solution:
>> >> >
>> >> > OrderedUnion[3][li_]:=Module[{tmp=Transpose[{li,Range[Length[li]]}]},
>> >> >   tmp=Union[tmp, SameTest->(#1[[1]]===#2[[1]]&)];
>> >> >   Last/@Sort[RotateLeft/@tmp]]
>> >> >
>> >> > The key idea here is that Sort with the default ordering function is
much
>> >> > faster than Sort with a user supplied ordering function. So, RotateLeft
>> >> > switches the order of the index and the element to allow Sort to be used
>> >> > with its default ordering function. For lists with a large number of
>> >> > distinct elements, this seems to improve the speed by a factor of 3.
>> >> >
>> >> > Carl Woll
>> >> > Dept of Physics
>> >> > U of Washington
>> >> >
>> >> > Wolf, Hartmut wrote:
>> >> >
>> >> >> Hello everone,
>> >> >>
>> >> >> I just sent a letter to Carl K.Woll, part of which I would like to
>> >> >> publish to the community:
>> >> >>
>> >> >> Hello Carl,
>> >> >>
>> >> >> congratulations for your brilliant solution of "easiest way to sort a
>> >> >> list?" At the first glance, ... I was already struck.
>> >> >>
>> >> >> According to my measurements, you proposed not only the fastest
>> >> >> solution, but also the second fastest, the fastest O[n log n] version.
>> >> >> ....
>> >> >>
>> >> >> The question that concerned me was, is your solution really O[n]? My
>> >> >> preoccupation before was that O[n log n] could be the best achievable.
>> >> >> Clearly your solution is O[n] for 'smaller' samples, but the question
>> >> >> was, is it still O[n] for very large samples?  So I did some
>> >> >> measurements, which I'd like to show you. They are in the two attached
>> >> >> notebooks (...not for you, unless you request it from
>> >> >> mailto:hwolf at debis.com )
>> >> >>
>> >> >> What is the interpretation of the graphs? In fact your solution is as
>> >> >> far as I could calculate seemingly linear overall. There are
interesting
>> >> >> wiggles in it. The algorithm seems to become increasingly worse, when
it
>> >> >> makes up it's mind and "starts anew". Well I think this are the traces
>> >> >> of Mathematica's Extendible Hashing algorithm for the symbol store.
When
>> >> >> hashing becomes more and more inefficient (due to collisions) then the
>> >> >> store ist extended. (This conforms to my observation of memory usage
>> >> >> with Windows NT Task Manager during the calculation, where I say
periods
>> >> >> of rapid increase of used memory and times of slower grow.) So it's
this
>> >> >> extension, which keeps your algorithm linear.
>> >> >>
>> >> >> ....
>> >> >>
>> >> >> With kind regards, your
>> >> >>         Hartmut Wolf
>> >> >
>> >
>> >
>> >
>> >
>> >
>>
>> Andrzej Kozlowski
>> Toyama International University
>> JAPAN
>> http://sigma.tuins.ac.jp/
>> http://eri2.tuins.ac.jp/
> 


  • Prev by Date: Re: Is there a FAQ? (Clear all)
  • Next by Date: Re: Cumulative distribution of Gauss
  • Previous by thread: Re: Re: RE: Re: Re: easiest way to sort a list?
  • Next by thread: Re: "At long last, Sir, have you no shame?"