MathGroup Archive: June 2000 [00114]

[Date Index] [Thread Index] [Author Index]
Re: Re: programming DeleteRepetitions
To: mathgroup at smc.vnet.net
Subject: [mg23783] Re: [mg23733] Re: [mg23662] programming DeleteRepetitions
From: Andrzej Kozlowski <andrzej at tuins.ac.jp>
Date: Sat, 10 Jun 2000 02:59:28 -0400 (EDT)
Sender: owner-wri-mathgroup at wolfram.com
Allan,

We discussed this issue in some detail almost exactly a year ago, when Carl
Woll came up with his OrderedUnion function. We then concluded that Carl's
function was the fastest when the number of different elements (i.e.
Length[Union[list]]) was relatively large compared with the total number of
elements (Length[list]), but was not so fast when there were only a few
distinct elements in a large list (e.g. when throwing a die a large number
of times). That is because in such a case Carl's function goes on looking
for repetitions long time after it found all the distinct elements. At that
time I wrote the following modification of Carl's function which uses Union
to find out how many distinct elements there are and then gets out once it
has found them all.



DR0[li_List] :=
  Block[{i, counter = 0, l0 = {}, m = Length[Union[li]], Sequence},
    i[n_] := (i[n] = Sequence[]; counter = counter + 1; n);
   Scan[If[counter < m, l0 = {l0, i[#]}, Return[Flatten[l0]]] &, li];
    Flatten[l0]]
 
In cases when the number of distinct elements is small (e.g. throwing a die
10000 times) in my tests it does better than all the functions you have
tested. (Actually results are different with Mathematica 3.0 and 4.0. They
are also rather different ona Mac (which I use) and under Windows.

Actually, it may still be true that even in the case when the number of
distinct elements is small Carl's function is assymptotically the fastest.
This is because DR0 uses Union, which sorts the elements of the list from
which it is removing repetitions . I am not sure however whether Union first
sorts the elements and then removes repetitions or first removes repetitions
and then sorts them (the sescond approach would require something like
Carl's function). If the former is the case than Carl's function is
certainly assymptotically the fastest, even if the number of distinct
elements is small, but if the latter is true than it need not be so.


-- 
Andrzej Kozlowski
Toyama International University, JAPAN

For Mathematica related links and resources try:
<http://www.sstreams.com/Mathematica/>



on 6/5/00 2:09 PM, Allan Hayes at hay at haystack.demon.co.uk wrote:

> Bob,
> I have added to your codings and done more timing.
> 
> The full list is {DR1, ...., DR9}
> 
> DR1 is a new version using Splot
> DR2 is Carl Woll's code that I gave before (tidied up)
> DR3 is a faster rewrite of your DeleteRepetitions1
> DR5 is your DeleteRepetitions1
> The rest are your other examples reordered (your numbering is given against
> the code).
> 
> In the timings:
> test[r, n] gives the timings for  DR1, ...DRn  on a list of 1000 random
> integers between 1 and r.
> 
> DR6, your DeleteRepetitions6, is fastest up to r = 20.
> 
> **TIMINGS**
> 
> test[2, 9]
> {0.11, 0.06, 0., 0.22, 0.05, 0., 0.17, 0., 1.37}
> test[10, 8]
> {0.17, 0.05, 0.06, 0.22, 0.05, 0., 0.55, 0.}
> test[20, 8]
> {0.11, 0.05, 0.11, 0.22, 0.11, 0., 1.16, 0.16}
> test[30, 8]
> {0.1, 0.06, 0.11, 0.22, 0.11, 0.11, 1.49, 2.47}
> test[40, 8]
> {0.11, 0.05, 0.17, 0.38, 0.17, 0.16, 1.87, 4.28}
> test[60, 6]
> {0.11, 0.11, 0.22, 0.27, 0.22, 0.66}
> test[100, 6]
> {0.16, 0.06, 0.05, 0.49, 0.44, 0.61}
> test[200, 6]
> {0.11, 0.17, 0.11, 0.49, 0.94, 3.57}
> test[400, 6]
> {0.17, 0.27, 0.16, 0.94, 2.42, 4.99}
> test[600, 6]
> {0.16, 0.28, 0.22, 1.43, 3.79, 6.53}
> test[1000, 6]
> {0.16, 0.39, 0.33, 3.07, 5.93, 7.75}
> test[1000000000000, 6] (*almost certainly no repetitions*)
> {0.38, 0.55, 5.05, 5.28, 5.16, 5.33}
> 
> 
> ** CODE**
> 
> Remove["`*"];
> DR1[x_List] :=
> Last[Transpose[
> Sort[Reverse /@
> First /@
> Split[Sort[
> Transpose[{x,
> Range[Length[x]]}]], #1[[1]] == #2[[1]] &]]]];
> 
> DF2[x_List] := Block[{i}, i[n_] := (i[n] = Sequence[]; n);
> i /@ x];
> 
> DF3[x_List] := x[[Sort[Flatten[First[Position[x, #]] & /@ Union[x]]]]];
> 
> (*2*)DF4[x_List] :=
> Module[{uniq = {}},
> If[Not[MemberQ[uniq, #]], (uniq = Append[uniq, #])] & /@ x;
> uniq];
> (*1*)DF5[x_List] :=
> Take[x, #] & /@ Sort[First /@ (Position[x, #] & /@ Union[x])] //
> Flatten;
> 
> (*6*)DF6[x_List] :=
> Module[{uniq = Union[x], n, portion}, n = Length[uniq];
> While[(Union[portion = Take[x, n]]) != uniq, n++];
> Take[portion, #] & /@ Sort[First /@ (Position[portion, #] & /@ uniq)]
> //
> Flatten];
> 
> (*2*)DF7[x_List] :=
> Transpose[
> Union[Transpose[Join[{Range[Length[x]]}, {x}]],
> SameTest -> (#1[[2]] == #2[[2]] &)]][[2]]
> 
> (*5*)DF8[x_List] :=
> Module[{uniq = Union[x], n, portion}, n = Length[uniq];
> While[(Union[portion = Take[x, n]]) != uniq, n++];
> portion //. {a___, b_, c___, b_, d___} -> {a, b, c, d}]
> 
> (*4*)DF9[x_List] := x //. {a___, b_, c___, b_, d___} -> {a, b, c, d}
> 
> **TEST CODE**
> 
> funcs = ToExpression[Names["DF*"]]
> 
> {DF1, DF2, DF3, DF4, DF5, DF6, DF7, DF8, DF9}
> 
> SameQ @@ Through[funcs[Table[Random[Integer, {1, 10}], {100}]]]
> 
> True
> 
> test[r_, n_] :=
> With[{lst = Table[Random[Integer, {1, r}], {1000}]},
> Timing[#[lst]][[1]]/Second & /@ Take[funcs, n]]
> 
> 
> 
> --
> Allan
> ---------------------
> Allan Hayes
> Mathematica Training and Consulting
> Leicester UK
> www.haystack.demon.co.uk
> hay at haystack.demon.co.uk
> Voice: +44 (0)116 271 4198
> Fax: +44 (0)870 164 0565
> 
> <BobHanlon at aol.com> wrote in message news:8gv88o$5n7 at smc.vnet.net...
>> 
>> In a message dated 5/28/2000 11:37:53 PM, pnichols at wittenberg.edu writes:
>> 
>>> Below I give a function which removes duplicates from a list (as Union
>>> does), but without sorting the result (as Union also does).  More
>>> specifically, it extracts, in order, the first instance of each distinct
>>> element of the list.
>>> 
>>> Is there any simpler way to do this?  It's a simple idea, but I seem to
>>> need seven different list-manipulation functions, including 3 uses of
> Map!
>>> 
>>> DeleteRepetitions[X_] :=
>>> Take[X, #] & /@
>>> Sort[First /@
>>> (Position[X, #] & /@
>>> Union[X])] // Flatten
>>> 
>>> For example,
>>> 
>>> In[2] := DeleteRepetitions[{3,1,2,3,3,2,4,1}]
>>> 
>>> Out[2] = {3,1,2,4}
>>> 
>>> In[3] := DeleteRepetitions[{b,a,b,a,c,a}]
>>> 
>>> Out[3] = {b,a,c}
>>> 
>>> I don't need to use this function on lists longer that 20 or so elements,
>>> so speed is not a critical concern.
>>> 
>>> Also, my version happens to work on expressions with heads other than
> List
>>> (because Take, Position, Union, and Flatten all do so), but I don't
> really
>>> need that feature.
>>> 
>>> How would you implement this function?
>>> 
>> 
>> DeleteRepetitions1[x_List] :=
>> 
>> Take[x, #] & /@
>> 
>> Sort[First /@
>> (Position[x, #] & /@
>> Union[x])] //
>> Flatten
>> 
>> DeleteRepetitions2[x_List] := Module[{uniq = {}},
>> If[Not[MemberQ[uniq, #]], (uniq = Append[uniq, #])] & /@ x;
>> uniq]
>> 
>> DeleteRepetitions3[x_List] :=
>> Transpose[
>> Union[Transpose[Join[{Range[Length[x]]}, {x}]],
>> SameTest -> (#1[[2]] == #2[[2]] &)]][[2]]
>> 
>> DeleteRepetitions4[x_List] := x //. {a___, b_, c___, b_, d___} -> {a, b,
> c, d}
>> 
>> DeleteRepetitions5[x_List] :=
>> Module[{uniq = Union[x], n, portion},
>> n = Length[uniq]; While[(Union[portion = Take[x, n]]) != uniq, n++];
>> portion //. {a___, b_, c___, b_, d___} -> {a, b, c, d}]
>> 
>> DeleteRepetitions6[x_List] :=
>> Module[{uniq = Union[x], n, portion},
>> n = Length[uniq]; While[(Union[portion = Take[x, n]]) != uniq, n++];
>> Take[portion, #] & /@
>> Sort[First /@
>> (Position[portion, #] & /@
>> uniq)] // Flatten]
>> 
>> testList = Table[Random[Integer, {1, 10}], {400}];
>> 
>> funcs = ToExpression[Names["DeleteRep*"]];
>> 
>> Demonstrating the equivalence of the functions
>> 
>> Equal[Sequence @@ (#[testList] & /@ funcs)]
>> 
>> True
>> 
>> Comparing their timings
>> 
>> Timing[#[testList]][[1]] & /@ funcs
>> 
>> {0.01666666666665151*Second, 0.049999999999954525*Second,
>> 0.10000000000002274*Second, 2.2999999999999545*Second,
>> 0.016666666666765195*Second, 0.*Second}
>> 
>> The fourth method is very slow for long lists. The fifth method speeds
> this
>> up provided that the list is highly redundant. The sixth method applies
> this
>> potential speed up to the first (original) method.
>> 
>> Bob
>> 
>> BobHanlon at aol.com
>> 
> 
> 
>
Prev by Date: Re: Round off or up
Next by Date: Re: DeleteRepetitions summary
Previous by thread: Re: Re: programming DeleteRepetitions
Next by thread: Mathematica Training