Re: Re: programming DeleteRepetitions

• To: mathgroup at smc.vnet.net
• Subject: [mg23796] Re: [mg23733] Re: [mg23662] programming DeleteRepetitions
• From: "Allan Hayes" <hay at haystack.demon.co.uk>
• Date: Sat, 10 Jun 2000 02:59:49 -0400 (EDT)
• References: <B561C5D5.693C%andrzej@tuins.ac.jp>
• Sender: owner-wri-mathgroup at wolfram.com

```Adrzej,
Thanks for the reminder.

I have found a slightly faster way of stopping the calculatiion when all
possible values have been found.

DRp1 below is the new code
DRp3 is my code using Split
DRp4 is Carl's original code

test[r] gives the timings for 5 runs of these on a list of 10000 entries of
random integers from 1 to r.

test[2]           {0.39, 0.38, 6.87, 1.2}
test[10]         {0.44, 0.49, 6.7, 1.87}
test[100]        {1.04, 1.7, 7.8, 2.86}
test[500]        {3.13, 6.38, 6.7, 3.62}
test[1000]      {6.04, 13.62, 6.81, 5.06}
test[1200]      {8.01, 19.17, 6.98, 5.99}
test[1500]      {8.95, 20.93, 7.03, 6.76}
test[2000]      {11.31, 23.01, 7.2, 8.68}
test[5000]      {25.71, 36.14, 8.51, 19.12}
test[7500]       {32.13, 39.66, 9.56, 22.02}
test[10000]     {32.14, 41.9, 9.07, 22.96}

These times support your view that Carl's code would be faster than its
elaborations when r large. However, for r = 10000 the Splitting method is
much quicker.

Here are the codes

Remove["`*"];

DRp1[x_List] :=
Block[{i, j, c = 0, s = {}},
j[Length[Union[x]]] := Throw[s] ;
i[n_] := (s = {s, i[n] = n}; j[++c]);
Scan[i, x]
] // Catch // Flatten;

DRp2[li_List] :=
Block[{i, counter = 0, l0 = {}, m = Length[Union[li]], Sequence},
i[n_] := (i[n] = Sequence[]; counter = counter + 1; n);
Scan[If[counter < m, l0 = {l0, i[#]}, Return[Flatten[l0]]] &, li];
Flatten[l0]];

DRp3[x_List] :=
Last[Transpose[
Sort[Reverse /@
First /@
Split[Sort[
Transpose[{x,
Range[Length[x]]}]], #1[[1]] == #2[[1]] &]]]];

DRp4[x_List] := Block[{i}, i[n_] := (i[n] = Sequence[]; n);
i /@ x];

And the testing code

funcs = ToExpression[Names["DR*"]]

{DRp1, DRp2, DRp3, DRp4}

test[r_] :=
With[{lst = Table[Random[Integer, {1, r}], {10000}]},
Timing[Do[#[lst], {5}]][[1]]/Second & /@ funcs]

Regaards

Allan
---------------------
Allan Hayes
Mathematica Training and Consulting
Leicester UK
www.haystack.demon.co.uk
hay at haystack.demon.co.uk
Voice: +44 (0)116 271 4198
Fax: +44 (0)870 164 0565

----- Original Message -----
From: "Andrzej Kozlowski" <andrzej at tuins.ac.jp>
To: mathgroup at smc.vnet.net
<BobHanlon at aol.com>
Subject: [mg23796] Re: [mg23733] Re: [mg23662] programming DeleteRepetitions

> Allan,
>
> We discussed this issue in some detail almost exactly a year ago, when
Carl
> Woll came up with his OrderedUnion function. We then concluded that Carl's
> function was the fastest when the number of different elements (i.e.
> Length[Union[list]]) was relatively large compared with the total number
of
> elements (Length[list]), but was not so fast when there were only a few
> distinct elements in a large list (e.g. when throwing a die a large number
> of times). That is because in such a case Carl's function goes on looking
> for repetitions long time after it found all the distinct elements. At
that
> time I wrote the following modification of Carl's function which uses
Union
> to find out how many distinct elements there are and then gets out once it
> has found them all.
>
>
>
> DR0[li_List] :=
>   Block[{i, counter = 0, l0 = {}, m = Length[Union[li]], Sequence},
>     i[n_] := (i[n] = Sequence[]; counter = counter + 1; n);
>    Scan[If[counter < m, l0 = {l0, i[#]}, Return[Flatten[l0]]] &, li];
>     Flatten[l0]]
>
> In cases when the number of distinct elements is small (e.g. throwing a
die
> 10000 times) in my tests it does better than all the functions you have
> tested. (Actually results are different with Mathematica 3.0 and 4.0. They
> are also rather different ona Mac (which I use) and under Windows.
>
> Actually, it may still be true that even in the case when the number of
> distinct elements is small Carl's function is assymptotically the fastest.
> This is because DR0 uses Union, which sorts the elements of the list from
> which it is removing repetitions . I am not sure however whether Union
first
> sorts the elements and then removes repetitions or first removes
repetitions
> and then sorts them (the sescond approach would require something like
> Carl's function). If the former is the case than Carl's function is
> certainly assymptotically the fastest, even if the number of distinct
> elements is small, but if the latter is true than it need not be so.
>
>
> --
> Andrzej Kozlowski
> Toyama International University, JAPAN
>
> For Mathematica related links and resources try:
> <http://www.sstreams.com/Mathematica/>
>
>
>
> on 6/5/00 2:09 PM, Allan Hayes at hay at haystack.demon.co.uk wrote:
>
> > Bob,
> > I have added to your codings and done more timing.
> >
> > The full list is {DR1, ...., DR9}
> >
> > DR1 is a new version using Splot
> > DR2 is Carl Woll's code that I gave before (tidied up)
> > DR3 is a faster rewrite of your DeleteRepetitions1
> > DR5 is your DeleteRepetitions1
> > The rest are your other examples reordered (your numbering is given
against
> > the code).
> >
> > In the timings:
> > test[r, n] gives the timings for  DR1, ...DRn  on a list of 1000 random
> > integers between 1 and r.
> >
> > DR6, your DeleteRepetitions6, is fastest up to r = 20.
> >
> > **TIMINGS**
> >
> > test[2, 9]
> > {0.11, 0.06, 0., 0.22, 0.05, 0., 0.17, 0., 1.37}
> > test[10, 8]
> > {0.17, 0.05, 0.06, 0.22, 0.05, 0., 0.55, 0.}
> > test[20, 8]
> > {0.11, 0.05, 0.11, 0.22, 0.11, 0., 1.16, 0.16}
> > test[30, 8]
> > {0.1, 0.06, 0.11, 0.22, 0.11, 0.11, 1.49, 2.47}
> > test[40, 8]
> > {0.11, 0.05, 0.17, 0.38, 0.17, 0.16, 1.87, 4.28}
> > test[60, 6]
> > {0.11, 0.11, 0.22, 0.27, 0.22, 0.66}
> > test[100, 6]
> > {0.16, 0.06, 0.05, 0.49, 0.44, 0.61}
> > test[200, 6]
> > {0.11, 0.17, 0.11, 0.49, 0.94, 3.57}
> > test[400, 6]
> > {0.17, 0.27, 0.16, 0.94, 2.42, 4.99}
> > test[600, 6]
> > {0.16, 0.28, 0.22, 1.43, 3.79, 6.53}
> > test[1000, 6]
> > {0.16, 0.39, 0.33, 3.07, 5.93, 7.75}
> > test[1000000000000, 6] (*almost certainly no repetitions*)
> > {0.38, 0.55, 5.05, 5.28, 5.16, 5.33}
> >
> >
> > ** CODE**
> >
> > Remove["`*"];
> > DR1[x_List] :=
> > Last[Transpose[
> > Sort[Reverse /@
> > First /@
> > Split[Sort[
> > Transpose[{x,
> > Range[Length[x]]}]], #1[[1]] == #2[[1]] &]]]];
> >
> > DF2[x_List] := Block[{i}, i[n_] := (i[n] = Sequence[]; n);
> > i /@ x];
> >
> > DF3[x_List] := x[[Sort[Flatten[First[Position[x, #]] & /@ Union[x]]]]];
> >
> > (*2*)DF4[x_List] :=
> > Module[{uniq = {}},
> > If[Not[MemberQ[uniq, #]], (uniq = Append[uniq, #])] & /@ x;
> > uniq];
> > (*1*)DF5[x_List] :=
> > Take[x, #] & /@ Sort[First /@ (Position[x, #] & /@ Union[x])] //
> > Flatten;
> >
> > (*6*)DF6[x_List] :=
> > Module[{uniq = Union[x], n, portion}, n = Length[uniq];
> > While[(Union[portion = Take[x, n]]) != uniq, n++];
> > Take[portion, #] & /@ Sort[First /@ (Position[portion, #] & /@ uniq)]
> > //
> > Flatten];
> >
> > (*2*)DF7[x_List] :=
> > Transpose[
> > Union[Transpose[Join[{Range[Length[x]]}, {x}]],
> > SameTest -> (#1[[2]] == #2[[2]] &)]][[2]]
> >
> > (*5*)DF8[x_List] :=
> > Module[{uniq = Union[x], n, portion}, n = Length[uniq];
> > While[(Union[portion = Take[x, n]]) != uniq, n++];
> > portion //. {a___, b_, c___, b_, d___} -> {a, b, c, d}]
> >
> > (*4*)DF9[x_List] := x //. {a___, b_, c___, b_, d___} -> {a, b, c, d}
> >
> > **TEST CODE**
> >
> > funcs = ToExpression[Names["DF*"]]
> >
> > {DF1, DF2, DF3, DF4, DF5, DF6, DF7, DF8, DF9}
> >
> > SameQ @@ Through[funcs[Table[Random[Integer, {1, 10}], {100}]]]
> >
> > True
> >
> > test[r_, n_] :=
> > With[{lst = Table[Random[Integer, {1, r}], {1000}]},
> > Timing[#[lst]][[1]]/Second & /@ Take[funcs, n]]
> >
> >
> >
> > --
> > Allan
> > ---------------------
> > Allan Hayes
> > Mathematica Training and Consulting
> > Leicester UK
> > www.haystack.demon.co.uk
> > hay at haystack.demon.co.uk
> > Voice: +44 (0)116 271 4198
> > Fax: +44 (0)870 164 0565
> >
> > <BobHanlon at aol.com> wrote in message news:8gv88o\$5n7 at smc.vnet.net...
> >>
> >> In a message dated 5/28/2000 11:37:53 PM, pnichols at wittenberg.edu
writes:
> >>
> >>> Below I give a function which removes duplicates from a list (as Union
> >>> does), but without sorting the result (as Union also does).  More
> >>> specifically, it extracts, in order, the first instance of each
distinct
> >>> element of the list.
> >>>
> >>> Is there any simpler way to do this?  It's a simple idea, but I seem
to
> >>> need seven different list-manipulation functions, including 3 uses of
> > Map!
> >>>
> >>> DeleteRepetitions[X_] :=
> >>> Take[X, #] & /@
> >>> Sort[First /@
> >>> (Position[X, #] & /@
> >>> Union[X])] // Flatten
> >>>
> >>> For example,
> >>>
> >>> In[2] := DeleteRepetitions[{3,1,2,3,3,2,4,1}]
> >>>
> >>> Out[2] = {3,1,2,4}
> >>>
> >>> In[3] := DeleteRepetitions[{b,a,b,a,c,a}]
> >>>
> >>> Out[3] = {b,a,c}
> >>>
> >>> I don't need to use this function on lists longer that 20 or so
elements,
> >>> so speed is not a critical concern.
> >>>
> >>> Also, my version happens to work on expressions with heads other than
> > List
> >>> (because Take, Position, Union, and Flatten all do so), but I don't
> > really
> >>> need that feature.
> >>>
> >>> How would you implement this function?
> >>>
> >>
> >> DeleteRepetitions1[x_List] :=
> >>
> >> Take[x, #] & /@
> >>
> >> Sort[First /@
> >> (Position[x, #] & /@
> >> Union[x])] //
> >> Flatten
> >>
> >> DeleteRepetitions2[x_List] := Module[{uniq = {}},
> >> If[Not[MemberQ[uniq, #]], (uniq = Append[uniq, #])] & /@ x;
> >> uniq]
> >>
> >> DeleteRepetitions3[x_List] :=
> >> Transpose[
> >> Union[Transpose[Join[{Range[Length[x]]}, {x}]],
> >> SameTest -> (#1[[2]] == #2[[2]] &)]][[2]]
> >>
> >> DeleteRepetitions4[x_List] := x //. {a___, b_, c___, b_, d___} -> {a,
b,
> > c, d}
> >>
> >> DeleteRepetitions5[x_List] :=
> >> Module[{uniq = Union[x], n, portion},
> >> n = Length[uniq]; While[(Union[portion = Take[x, n]]) != uniq, n++];
> >> portion //. {a___, b_, c___, b_, d___} -> {a, b, c, d}]
> >>
> >> DeleteRepetitions6[x_List] :=
> >> Module[{uniq = Union[x], n, portion},
> >> n = Length[uniq]; While[(Union[portion = Take[x, n]]) != uniq, n++];
> >> Take[portion, #] & /@
> >> Sort[First /@
> >> (Position[portion, #] & /@
> >> uniq)] // Flatten]
> >>
> >> testList = Table[Random[Integer, {1, 10}], {400}];
> >>
> >> funcs = ToExpression[Names["DeleteRep*"]];
> >>
> >> Demonstrating the equivalence of the functions
> >>
> >> Equal[Sequence @@ (#[testList] & /@ funcs)]
> >>
> >> True
> >>
> >> Comparing their timings
> >>
> >> Timing[#[testList]][[1]] & /@ funcs
> >>
> >> {0.01666666666665151*Second, 0.049999999999954525*Second,
> >> 0.10000000000002274*Second, 2.2999999999999545*Second,
> >> 0.016666666666765195*Second, 0.*Second}
> >>
> >> The fourth method is very slow for long lists. The fifth method speeds
> > this
> >> up provided that the list is highly redundant. The sixth method applies
> > this
> >> potential speed up to the first (original) method.
> >>
> >> Bob
> >>
> >> BobHanlon at aol.com
> >>
> >
> >
> >
>

```