Re(2): Re: RE: Re: Re: easiest way to sort a list?

• To: mathgroup at smc.vnet.net
• Subject: [mg18598] Re(2): [mg18556] Re: [mg18525] RE: [mg18428] Re: [mg18344] Re: [mg18308] easiest way to sort a list?
• From: Andrzej Kozlowski <andrzej at tuins.ac.jp>
• Date: Tue, 13 Jul 1999 01:01:29 -0400
• Sender: owner-wri-mathgroup at wolfram.com

```Actually, OrderedUnion[4] is dreadfully slow on my Mac for this last
list. Here is what I get:

In[16]:=
Table[OrderedUnion[i][l3]; // Timing, {i, 1, 4}]
Out[16]=
{{1.16667 Second, Null}, {2.15 Second, Null}, {38.5833 Second,
Null}, {396.433 Second, Null}}

And yet Sort and Union are indeed in version 4. You can see this when I
add Colin Rose's function to the test list:

OrderedUnion[5][lis_] :=

Part[lis, Sort[Map[Position[lis, #][[1]] &, Union[lis]] // Flatten]]

Running all five in a fresh Mathematica 4.0 session I get:

In[6]:=
l = Table[Random[Integer, {1, 5000}], {10000}];

In[7]:=
Table[OrderedUnion[i][l]; // Timing, {i, 1, 5}]
Out[7]=
{{1.16667 Second, Null}, {2.06667 Second, Null}, {36.3 Second,
Null}, {403.417 Second, Null}, {4.93333 Second, Null}}

with OrderedUnion[4] doing slightly worse than before. For a list with
only a few digits I get:

In[8]:=
l1 = Table[Random[Integer, {1, 50}], {10000}];

In[9]:=
Table[OrderedUnion[i][l1]; // Timing, {i, 1, 5}]
Out[9]=
{{0.166667 Second, Null}, {0.05 Second, Null}, {1.01667 Second,
Null}, {0.566667 Second, Null}, {1.16667 Second, Null}}

So on my machine it is only for such types of lists that OrderedUnion[4]
does relatively well. There may be something like the Split problem
involved here, but I do not think it's wrth the time and effort it would
take to investigate this. I am now inclined to think that cross-platform
timings in Mathematica are almost meaningless.

On Sat, Jul 10, 1999, Carl K.Woll <carlw at fermi.phys.washington.edu> wrote:

>Andrzej,
>
>Your results are very different from those I get with Mathematica V3.0
on a 100
>MHz PC
>with Windows 95. For your l2 list, I got
>
>{{1.93 Second, Null}, {6.04 Second, Null},
> {43.12 Second, Null}, {5.65 Second, Null}}
>
>If I try using the following list
>
>Table[Random[Integer, {1, 5000}], {10000}];
>
>I get the following timings
>
>{{7.47 Second, Null}, {14.4 Second, Null},
> {177.24 Second, Null}, {6.98 Second, Null}}
>
>So, for some reason OrderedUnion[4] is much slower on your machine than it is
>on mine.
>That is, for the list l2, OrderedUnion[4] is 66 times slower than
>OrderedUnion[1] on your
>machine, but it is only 3 times slower on my machine. This is odd, since
>OrderedUnion[4]
>uses Sort and Union, and I thought Colin Rose suggested that Sort and
Union are
>much
>faster in Version 4.0 than in Version 3.0. Hence, I would have thought that
>OrderedUnion[4] would have worked even better on your machine.
>
>Carl Woll
>Dept of Physics
>U of Washington
>
>Andrzej Kozlowski wrote:
>
>> Here are some more results on my PowerBook G3 (233 mghz, Mathematica 4.0):
>>
>> In[1]:=
>> OrderedUnion[1][li_] :=
>>   Block[{i, Sequence},
>>                   i[n_] := (i[n] = Sequence[]; n);
>>                   i /@ li]
>>
>> OrderedUnion[2][li_] :=
>>   Block[{i, counter = 0, l0 = {}, m = Length[Union[li]], Sequence},
>>     i[n_] := (i[n] = Sequence[]; counter = counter + 1; n);
>>    Scan[If[counter < m, l0 = {l0, i[#]}, Return[Flatten[l0]]] &, li];
>>     Flatten[l0]]
>>
>> OrderedUnion[3][h_[args__]] :=
>>   Fold[ If[ FreeQ[#1, #2, 1], Join[#1, h[#2]], #1] & , h[], {args}]
>>
>> OrderedUnion[4][li_] :=
>>   Module[{tmp = Transpose[{li, Range[Length[li]]}]},
>>     tmp = Union[tmp, SameTest -> (#1[[1]] === #2[[1]] &)];
>>     Last /@ Sort[RotateLeft /@ tmp]]
>>
>> First a long list with a small number of distinnct elements:
>>
>> In[5]:=
>> l1 = Table[Random[Integer, {1, 10}], {10000}];
>>
>> In[6]:=
>> Table[OrderedUnion[i][l1]; // Timing, {i, 1, 4}]
>> Out[6]=
>> {{0.183333 Second, Null}, {0.0333333 Second, Null},
>>
>>   {0.75 Second, Null}, {0.533333 Second, Null}}
>>
>> Next a long list with quite many distinct elements:
>>
>> l2 = Table[Random[Integer, {1, 1000}], {10000}];
>>
>> In[8]:=
>> Table[OrderedUnion[i][l2]; // Timing, {i, 1, 4}]
>> Out[8]=
>> {{0.3 Second, Null}, {1.05 Second, Null},
>>
>>   {8.48333 Second, Null}, {22.0333 Second, Null}}
>>
>> --
>> Andrzej Kozlowski
>> Toyama International University
>> JAPAN
>> http://sigma.tuins.ac.jp
>> http://eri2.tuins.ac.jp
>>
>> ----------
>> >From: "Ersek, Ted R" <ErsekTR at navair.navy.mil>
To: mathgroup at smc.vnet.net
>> To: mathgroup at smc.vnet.net
>> >To: mathgroup at smc.vnet.net
>> >Subject: [mg18598] [mg18556] [mg18525] RE: [mg18428] Re: [mg18344] Re: [mg18308]
>easiest way to
>> sort a list?
>> >Date: Fri, Jul 9, 1999, 11:32 AM
>> >
>>
>> > I have a new approach to add to this long thread.
>> >
>> > In[1]:=
>> > OrderedUnion5[h_[args__]]:=
>> >   Fold[ If[ FreeQ[#1,#2,1], Join[#1,h[#2]], #1]& ,h[],{args}]
>> >
>> > In[2]:=
>> > OrderedUnion5[{5,3,5,3,1,2,2,3,4,5,3}]
>> >
>> > Out[2]=
>> > {5,3,1,2,4}
>> >
>> >
>> > I would like to see how the timings of different methods compare (version
>> > above included).  I would do it, but I am working under Windows 98, and I
>> > understand Microsoft made it impossible for Timing to give accurate
>results.
>> >
>> > Regards,
>> > Ted Ersek
>> >
>> > --------------------
>> > Hi Hartmut (and newsgroup),
>> >
>> > Thanks for unraveling the linear nature of the Block[{i... function. Nice
>> > explanation!
>> >
>> > Since this topic shows no sign of slowing down, I thought I would present
>a
>> > couple refined versions of my two functions. First, an improvement of the
>> > "linear" solution:
>> >
>> > OrderedUnion[li_]:=Block[{i,Sequence},
>> >     i[n_]:=(i[n]=Sequence[];n);
>> >     i/@li]
>> >
>> > By including Sequence in the Block variables, Mathematica doesn't waste
>time
>> > eliminating Sequences until after the Block is exited. This seems to be a
>> > 5-10% or so improvement in speed.
>> >
>> > Secondly, an improvement of my O(n log n) solution:
>> >
>> > OrderedUnion[3][li_]:=Module[{tmp=Transpose[{li,Range[Length[li]]}]},
>> >   tmp=Union[tmp, SameTest->(#1[[1]]===#2[[1]]&)];
>> >   Last/@Sort[RotateLeft/@tmp]]
>> >
>> > The key idea here is that Sort with the default ordering function is much
>> > faster than Sort with a user supplied ordering function. So, RotateLeft
>> > switches the order of the index and the element to allow Sort to be used
>> > with its default ordering function. For lists with a large number of
>> > distinct elements, this seems to improve the speed by a factor of 3.
>> >
>> > Carl Woll
>> > Dept of Physics
>> > U of Washington
>> >
>> > Wolf, Hartmut wrote:
>> >
>> >> Hello everone,
>> >>
>> >> I just sent a letter to Carl K.Woll, part of which I would like to
>> >> publish to the community:
>> >>
>> >> Hello Carl,
>> >>
>> >> congratulations for your brilliant solution of "easiest way to sort a
>> >> list?" At the first glance, ... I was already struck.
>> >>
>> >> According to my measurements, you proposed not only the fastest
>> >> solution, but also the second fastest, the fastest O[n log n] version.
>> >> ....
>> >>
>> >> The question that concerned me was, is your solution really O[n]? My
>> >> preoccupation before was that O[n log n] could be the best achievable.
>> >> Clearly your solution is O[n] for 'smaller' samples, but the question
>> >> was, is it still O[n] for very large samples?  So I did some
>> >> measurements, which I'd like to show you. They are in the two attached
>> >> notebooks (...not for you, unless you request it from
>> >> mailto:hwolf at debis.com )
>> >>
>> >> What is the interpretation of the graphs? In fact your solution is as
>> >> far as I could calculate seemingly linear overall. There are interesting
>> >> wiggles in it. The algorithm seems to become increasingly worse, when it
>> >> makes up it's mind and "starts anew". Well I think this are the traces
>> >> of Mathematica's Extendible Hashing algorithm for the symbol store. When
>> >> hashing becomes more and more inefficient (due to collisions) then the
>> >> store ist extended. (This conforms to my observation of memory usage
>> >> with Windows NT Task Manager during the calculation, where I say periods
>> >> of rapid increase of used memory and times of slower grow.) So it's this
>> >> extension, which keeps your algorithm linear.
>> >>
>> >> ....
>> >>
>> >> With kind regards, your
>> >>         Hartmut Wolf
>> >
>
>
>
>
>

Andrzej Kozlowski
Toyama International University
JAPAN
http://sigma.tuins.ac.jp/
http://eri2.tuins.ac.jp/

```

• Prev by Date: Re: Not Plotting Vertical Assymptotes
• Next by Date: Re: Manipulating differential equations
• Previous by thread: HOW BEST for figs: Mathematica -> Textures -> PDF ? (long)
• Next by thread: importing multiple files