Re: Re: programming DeleteRepetitions

*To*: mathgroup at smc.vnet.net*Subject*: [mg23783] Re: [mg23733] Re: [mg23662] programming DeleteRepetitions*From*: Andrzej Kozlowski <andrzej at tuins.ac.jp>*Date*: Sat, 10 Jun 2000 02:59:28 -0400 (EDT)*Sender*: owner-wri-mathgroup at wolfram.com

Allan, We discussed this issue in some detail almost exactly a year ago, when Carl Woll came up with his OrderedUnion function. We then concluded that Carl's function was the fastest when the number of different elements (i.e. Length[Union[list]]) was relatively large compared with the total number of elements (Length[list]), but was not so fast when there were only a few distinct elements in a large list (e.g. when throwing a die a large number of times). That is because in such a case Carl's function goes on looking for repetitions long time after it found all the distinct elements. At that time I wrote the following modification of Carl's function which uses Union to find out how many distinct elements there are and then gets out once it has found them all. DR0[li_List] := Block[{i, counter = 0, l0 = {}, m = Length[Union[li]], Sequence}, i[n_] := (i[n] = Sequence[]; counter = counter + 1; n); Scan[If[counter < m, l0 = {l0, i[#]}, Return[Flatten[l0]]] &, li]; Flatten[l0]] In cases when the number of distinct elements is small (e.g. throwing a die 10000 times) in my tests it does better than all the functions you have tested. (Actually results are different with Mathematica 3.0 and 4.0. They are also rather different ona Mac (which I use) and under Windows. Actually, it may still be true that even in the case when the number of distinct elements is small Carl's function is assymptotically the fastest. This is because DR0 uses Union, which sorts the elements of the list from which it is removing repetitions . I am not sure however whether Union first sorts the elements and then removes repetitions or first removes repetitions and then sorts them (the sescond approach would require something like Carl's function). If the former is the case than Carl's function is certainly assymptotically the fastest, even if the number of distinct elements is small, but if the latter is true than it need not be so. -- Andrzej Kozlowski Toyama International University, JAPAN For Mathematica related links and resources try: <http://www.sstreams.com/Mathematica/> on 6/5/00 2:09 PM, Allan Hayes at hay at haystack.demon.co.uk wrote: > Bob, > I have added to your codings and done more timing. > > The full list is {DR1, ...., DR9} > > DR1 is a new version using Splot > DR2 is Carl Woll's code that I gave before (tidied up) > DR3 is a faster rewrite of your DeleteRepetitions1 > DR5 is your DeleteRepetitions1 > The rest are your other examples reordered (your numbering is given against > the code). > > In the timings: > test[r, n] gives the timings for DR1, ...DRn on a list of 1000 random > integers between 1 and r. > > DR6, your DeleteRepetitions6, is fastest up to r = 20. > > **TIMINGS** > > test[2, 9] > {0.11, 0.06, 0., 0.22, 0.05, 0., 0.17, 0., 1.37} > test[10, 8] > {0.17, 0.05, 0.06, 0.22, 0.05, 0., 0.55, 0.} > test[20, 8] > {0.11, 0.05, 0.11, 0.22, 0.11, 0., 1.16, 0.16} > test[30, 8] > {0.1, 0.06, 0.11, 0.22, 0.11, 0.11, 1.49, 2.47} > test[40, 8] > {0.11, 0.05, 0.17, 0.38, 0.17, 0.16, 1.87, 4.28} > test[60, 6] > {0.11, 0.11, 0.22, 0.27, 0.22, 0.66} > test[100, 6] > {0.16, 0.06, 0.05, 0.49, 0.44, 0.61} > test[200, 6] > {0.11, 0.17, 0.11, 0.49, 0.94, 3.57} > test[400, 6] > {0.17, 0.27, 0.16, 0.94, 2.42, 4.99} > test[600, 6] > {0.16, 0.28, 0.22, 1.43, 3.79, 6.53} > test[1000, 6] > {0.16, 0.39, 0.33, 3.07, 5.93, 7.75} > test[1000000000000, 6] (*almost certainly no repetitions*) > {0.38, 0.55, 5.05, 5.28, 5.16, 5.33} > > > ** CODE** > > Remove["`*"]; > DR1[x_List] := > Last[Transpose[ > Sort[Reverse /@ > First /@ > Split[Sort[ > Transpose[{x, > Range[Length[x]]}]], #1[[1]] == #2[[1]] &]]]]; > > DF2[x_List] := Block[{i}, i[n_] := (i[n] = Sequence[]; n); > i /@ x]; > > DF3[x_List] := x[[Sort[Flatten[First[Position[x, #]] & /@ Union[x]]]]]; > > (*2*)DF4[x_List] := > Module[{uniq = {}}, > If[Not[MemberQ[uniq, #]], (uniq = Append[uniq, #])] & /@ x; > uniq]; > (*1*)DF5[x_List] := > Take[x, #] & /@ Sort[First /@ (Position[x, #] & /@ Union[x])] // > Flatten; > > (*6*)DF6[x_List] := > Module[{uniq = Union[x], n, portion}, n = Length[uniq]; > While[(Union[portion = Take[x, n]]) != uniq, n++]; > Take[portion, #] & /@ Sort[First /@ (Position[portion, #] & /@ uniq)] > // > Flatten]; > > (*2*)DF7[x_List] := > Transpose[ > Union[Transpose[Join[{Range[Length[x]]}, {x}]], > SameTest -> (#1[[2]] == #2[[2]] &)]][[2]] > > (*5*)DF8[x_List] := > Module[{uniq = Union[x], n, portion}, n = Length[uniq]; > While[(Union[portion = Take[x, n]]) != uniq, n++]; > portion //. {a___, b_, c___, b_, d___} -> {a, b, c, d}] > > (*4*)DF9[x_List] := x //. {a___, b_, c___, b_, d___} -> {a, b, c, d} > > **TEST CODE** > > funcs = ToExpression[Names["DF*"]] > > {DF1, DF2, DF3, DF4, DF5, DF6, DF7, DF8, DF9} > > SameQ @@ Through[funcs[Table[Random[Integer, {1, 10}], {100}]]] > > True > > test[r_, n_] := > With[{lst = Table[Random[Integer, {1, r}], {1000}]}, > Timing[#[lst]][[1]]/Second & /@ Take[funcs, n]] > > > > -- > Allan > --------------------- > Allan Hayes > Mathematica Training and Consulting > Leicester UK > www.haystack.demon.co.uk > hay at haystack.demon.co.uk > Voice: +44 (0)116 271 4198 > Fax: +44 (0)870 164 0565 > > <BobHanlon at aol.com> wrote in message news:8gv88o$5n7 at smc.vnet.net... >> >> In a message dated 5/28/2000 11:37:53 PM, pnichols at wittenberg.edu writes: >> >>> Below I give a function which removes duplicates from a list (as Union >>> does), but without sorting the result (as Union also does). More >>> specifically, it extracts, in order, the first instance of each distinct >>> element of the list. >>> >>> Is there any simpler way to do this? It's a simple idea, but I seem to >>> need seven different list-manipulation functions, including 3 uses of > Map! >>> >>> DeleteRepetitions[X_] := >>> Take[X, #] & /@ >>> Sort[First /@ >>> (Position[X, #] & /@ >>> Union[X])] // Flatten >>> >>> For example, >>> >>> In[2] := DeleteRepetitions[{3,1,2,3,3,2,4,1}] >>> >>> Out[2] = {3,1,2,4} >>> >>> In[3] := DeleteRepetitions[{b,a,b,a,c,a}] >>> >>> Out[3] = {b,a,c} >>> >>> I don't need to use this function on lists longer that 20 or so elements, >>> so speed is not a critical concern. >>> >>> Also, my version happens to work on expressions with heads other than > List >>> (because Take, Position, Union, and Flatten all do so), but I don't > really >>> need that feature. >>> >>> How would you implement this function? >>> >> >> DeleteRepetitions1[x_List] := >> >> Take[x, #] & /@ >> >> Sort[First /@ >> (Position[x, #] & /@ >> Union[x])] // >> Flatten >> >> DeleteRepetitions2[x_List] := Module[{uniq = {}}, >> If[Not[MemberQ[uniq, #]], (uniq = Append[uniq, #])] & /@ x; >> uniq] >> >> DeleteRepetitions3[x_List] := >> Transpose[ >> Union[Transpose[Join[{Range[Length[x]]}, {x}]], >> SameTest -> (#1[[2]] == #2[[2]] &)]][[2]] >> >> DeleteRepetitions4[x_List] := x //. {a___, b_, c___, b_, d___} -> {a, b, > c, d} >> >> DeleteRepetitions5[x_List] := >> Module[{uniq = Union[x], n, portion}, >> n = Length[uniq]; While[(Union[portion = Take[x, n]]) != uniq, n++]; >> portion //. {a___, b_, c___, b_, d___} -> {a, b, c, d}] >> >> DeleteRepetitions6[x_List] := >> Module[{uniq = Union[x], n, portion}, >> n = Length[uniq]; While[(Union[portion = Take[x, n]]) != uniq, n++]; >> Take[portion, #] & /@ >> Sort[First /@ >> (Position[portion, #] & /@ >> uniq)] // Flatten] >> >> testList = Table[Random[Integer, {1, 10}], {400}]; >> >> funcs = ToExpression[Names["DeleteRep*"]]; >> >> Demonstrating the equivalence of the functions >> >> Equal[Sequence @@ (#[testList] & /@ funcs)] >> >> True >> >> Comparing their timings >> >> Timing[#[testList]][[1]] & /@ funcs >> >> {0.01666666666665151*Second, 0.049999999999954525*Second, >> 0.10000000000002274*Second, 2.2999999999999545*Second, >> 0.016666666666765195*Second, 0.*Second} >> >> The fourth method is very slow for long lists. The fifth method speeds > this >> up provided that the list is highly redundant. The sixth method applies > this >> potential speed up to the first (original) method. >> >> Bob >> >> BobHanlon at aol.com >> > > >