Re: Re: programming DeleteRepetitions

*To*: mathgroup at smc.vnet.net*Subject*: [mg23801] Re: [mg23733] Re: [mg23662] programming DeleteRepetitions*From*: Andrzej Kozlowski <andrzej at tuins.ac.jp>*Date*: Sat, 10 Jun 2000 02:59:59 -0400 (EDT)*Sender*: owner-wri-mathgroup at wolfram.com

Allan, Unfortunately this is to some extent platform dependent. Split performs poorly on the Macinotosh (something about which I have written a few times to this list) so even for r=10000 your Splitting method performs worse on my PowerBook than Carl's function. In fact on the Mac it is possible to write a version of Split which performs better than the built in one, even for lists of moderate size and its advantage increases with the length of list, since the built in Split on the Mac does not scale linearly. on 6/6/00 6:20 AM, Allan Hayes at hay at haystack.demon.co.uk wrote: > Adrzej, > Thanks for the reminder. > > I have found a slightly faster way of stopping the calculatiion when all > possible values have been found. > > DRp1 below is the new code > DRp2 is your code > DRp3 is my code using Split > DRp4 is Carl's original code > > test[r] gives the timings for 5 runs of these on a list of 10000 entries of > random integers from 1 to r. > > test[2] {0.39, 0.38, 6.87, 1.2} > test[10] {0.44, 0.49, 6.7, 1.87} > test[100] {1.04, 1.7, 7.8, 2.86} > test[500] {3.13, 6.38, 6.7, 3.62} > test[1000] {6.04, 13.62, 6.81, 5.06} > test[1200] {8.01, 19.17, 6.98, 5.99} > test[1500] {8.95, 20.93, 7.03, 6.76} > test[2000] {11.31, 23.01, 7.2, 8.68} > test[5000] {25.71, 36.14, 8.51, 19.12} > test[7500] {32.13, 39.66, 9.56, 22.02} > test[10000] {32.14, 41.9, 9.07, 22.96} > > These times support your view that Carl's code would be faster than its > elaborations when r large. However, for r = 10000 the Splitting method is > much quicker. > > Here are the codes > > Remove["`*"]; > > DRp1[x_List] := > Block[{i, j, c = 0, s = {}}, > j[Length[Union[x]]] := Throw[s] ; > i[n_] := (s = {s, i[n] = n}; j[++c]); > Scan[i, x] > ] // Catch // Flatten; > > DRp2[li_List] := > Block[{i, counter = 0, l0 = {}, m = Length[Union[li]], Sequence}, > i[n_] := (i[n] = Sequence[]; counter = counter + 1; n); > Scan[If[counter < m, l0 = {l0, i[#]}, Return[Flatten[l0]]] &, li]; > Flatten[l0]]; > > DRp3[x_List] := > Last[Transpose[ > Sort[Reverse /@ > First /@ > Split[Sort[ > Transpose[{x, > Range[Length[x]]}]], #1[[1]] == #2[[1]] &]]]]; > > DRp4[x_List] := Block[{i}, i[n_] := (i[n] = Sequence[]; n); > i /@ x]; > > > And the testing code > > funcs = ToExpression[Names["DR*"]] > > {DRp1, DRp2, DRp3, DRp4} > > test[r_] := > With[{lst = Table[Random[Integer, {1, r}], {10000}]}, > Timing[Do[#[lst], {5}]][[1]]/Second & /@ funcs] > > Regaards > > Allan > --------------------- > Allan Hayes > Mathematica Training and Consulting > Leicester UK > www.haystack.demon.co.uk > hay at haystack.demon.co.uk > Voice: +44 (0)116 271 4198 > Fax: +44 (0)870 164 0565 > > ----- Original Message ----- > From: "Andrzej Kozlowski" <andrzej at tuins.ac.jp> To: mathgroup at smc.vnet.net > To: "Allan Hayes" <hay at haystack.demon.co.uk>; <mathgroup at smc.vnet.net>; > <BobHanlon at aol.com> > Sent: 05 June 2000 13:02 > Subject: [mg23801] Re: [mg23733] Re: [mg23662] programming DeleteRepetitions > > >> Allan, >> >> We discussed this issue in some detail almost exactly a year ago, when > Carl >> Woll came up with his OrderedUnion function. We then concluded that Carl's >> function was the fastest when the number of different elements (i.e. >> Length[Union[list]]) was relatively large compared with the total number > of >> elements (Length[list]), but was not so fast when there were only a few >> distinct elements in a large list (e.g. when throwing a die a large number >> of times). That is because in such a case Carl's function goes on looking >> for repetitions long time after it found all the distinct elements. At > that >> time I wrote the following modification of Carl's function which uses > Union >> to find out how many distinct elements there are and then gets out once it >> has found them all. >> >> >> >> DR0[li_List] := >> Block[{i, counter = 0, l0 = {}, m = Length[Union[li]], Sequence}, >> i[n_] := (i[n] = Sequence[]; counter = counter + 1; n); >> Scan[If[counter < m, l0 = {l0, i[#]}, Return[Flatten[l0]]] &, li]; >> Flatten[l0]] >> >> In cases when the number of distinct elements is small (e.g. throwing a > die >> 10000 times) in my tests it does better than all the functions you have >> tested. (Actually results are different with Mathematica 3.0 and 4.0. They >> are also rather different ona Mac (which I use) and under Windows. >> >> Actually, it may still be true that even in the case when the number of >> distinct elements is small Carl's function is assymptotically the fastest. >> This is because DR0 uses Union, which sorts the elements of the list from >> which it is removing repetitions . I am not sure however whether Union > first >> sorts the elements and then removes repetitions or first removes > repetitions >> and then sorts them (the sescond approach would require something like >> Carl's function). If the former is the case than Carl's function is >> certainly assymptotically the fastest, even if the number of distinct >> elements is small, but if the latter is true than it need not be so. >> >> >> -- >> Andrzej Kozlowski >> Toyama International University, JAPAN >> >> For Mathematica related links and resources try: >> <http://www.sstreams.com/Mathematica/> >> >> >> >> on 6/5/00 2:09 PM, Allan Hayes at hay at haystack.demon.co.uk wrote: >> >>> Bob, >>> I have added to your codings and done more timing. >>> >>> The full list is {DR1, ...., DR9} >>> >>> DR1 is a new version using Splot >>> DR2 is Carl Woll's code that I gave before (tidied up) >>> DR3 is a faster rewrite of your DeleteRepetitions1 >>> DR5 is your DeleteRepetitions1 >>> The rest are your other examples reordered (your numbering is given > against >>> the code). >>> >>> In the timings: >>> test[r, n] gives the timings for DR1, ...DRn on a list of 1000 random >>> integers between 1 and r. >>> >>> DR6, your DeleteRepetitions6, is fastest up to r = 20. >>> >>> **TIMINGS** >>> >>> test[2, 9] >>> {0.11, 0.06, 0., 0.22, 0.05, 0., 0.17, 0., 1.37} >>> test[10, 8] >>> {0.17, 0.05, 0.06, 0.22, 0.05, 0., 0.55, 0.} >>> test[20, 8] >>> {0.11, 0.05, 0.11, 0.22, 0.11, 0., 1.16, 0.16} >>> test[30, 8] >>> {0.1, 0.06, 0.11, 0.22, 0.11, 0.11, 1.49, 2.47} >>> test[40, 8] >>> {0.11, 0.05, 0.17, 0.38, 0.17, 0.16, 1.87, 4.28} >>> test[60, 6] >>> {0.11, 0.11, 0.22, 0.27, 0.22, 0.66} >>> test[100, 6] >>> {0.16, 0.06, 0.05, 0.49, 0.44, 0.61} >>> test[200, 6] >>> {0.11, 0.17, 0.11, 0.49, 0.94, 3.57} >>> test[400, 6] >>> {0.17, 0.27, 0.16, 0.94, 2.42, 4.99} >>> test[600, 6] >>> {0.16, 0.28, 0.22, 1.43, 3.79, 6.53} >>> test[1000, 6] >>> {0.16, 0.39, 0.33, 3.07, 5.93, 7.75} >>> test[1000000000000, 6] (*almost certainly no repetitions*) >>> {0.38, 0.55, 5.05, 5.28, 5.16, 5.33} >>> >>> >>> ** CODE** >>> >>> Remove["`*"]; >>> DR1[x_List] := >>> Last[Transpose[ >>> Sort[Reverse /@ >>> First /@ >>> Split[Sort[ >>> Transpose[{x, >>> Range[Length[x]]}]], #1[[1]] == #2[[1]] &]]]]; >>> >>> DF2[x_List] := Block[{i}, i[n_] := (i[n] = Sequence[]; n); >>> i /@ x]; >>> >>> DF3[x_List] := x[[Sort[Flatten[First[Position[x, #]] & /@ Union[x]]]]]; >>> >>> (*2*)DF4[x_List] := >>> Module[{uniq = {}}, >>> If[Not[MemberQ[uniq, #]], (uniq = Append[uniq, #])] & /@ x; >>> uniq]; >>> (*1*)DF5[x_List] := >>> Take[x, #] & /@ Sort[First /@ (Position[x, #] & /@ Union[x])] // >>> Flatten; >>> >>> (*6*)DF6[x_List] := >>> Module[{uniq = Union[x], n, portion}, n = Length[uniq]; >>> While[(Union[portion = Take[x, n]]) != uniq, n++]; >>> Take[portion, #] & /@ Sort[First /@ (Position[portion, #] & /@ uniq)] >>> // >>> Flatten]; >>> >>> (*2*)DF7[x_List] := >>> Transpose[ >>> Union[Transpose[Join[{Range[Length[x]]}, {x}]], >>> SameTest -> (#1[[2]] == #2[[2]] &)]][[2]] >>> >>> (*5*)DF8[x_List] := >>> Module[{uniq = Union[x], n, portion}, n = Length[uniq]; >>> While[(Union[portion = Take[x, n]]) != uniq, n++]; >>> portion //. {a___, b_, c___, b_, d___} -> {a, b, c, d}] >>> >>> (*4*)DF9[x_List] := x //. {a___, b_, c___, b_, d___} -> {a, b, c, d} >>> >>> **TEST CODE** >>> >>> funcs = ToExpression[Names["DF*"]] >>> >>> {DF1, DF2, DF3, DF4, DF5, DF6, DF7, DF8, DF9} >>> >>> SameQ @@ Through[funcs[Table[Random[Integer, {1, 10}], {100}]]] >>> >>> True >>> >>> test[r_, n_] := >>> With[{lst = Table[Random[Integer, {1, r}], {1000}]}, >>> Timing[#[lst]][[1]]/Second & /@ Take[funcs, n]] >>> >>> >>> >>> -- >>> Allan >>> --------------------- >>> Allan Hayes >>> Mathematica Training and Consulting >>> Leicester UK >>> www.haystack.demon.co.uk >>> hay at haystack.demon.co.uk >>> Voice: +44 (0)116 271 4198 >>> Fax: +44 (0)870 164 0565 >>> >>> <BobHanlon at aol.com> wrote in message news:8gv88o$5n7 at smc.vnet.net... >>>> >>>> In a message dated 5/28/2000 11:37:53 PM, pnichols at wittenberg.edu > writes: >>>> >>>>> Below I give a function which removes duplicates from a list (as Union >>>>> does), but without sorting the result (as Union also does). More >>>>> specifically, it extracts, in order, the first instance of each > distinct >>>>> element of the list. >>>>> >>>>> Is there any simpler way to do this? It's a simple idea, but I seem > to >>>>> need seven different list-manipulation functions, including 3 uses of >>> Map! >>>>> >>>>> DeleteRepetitions[X_] := >>>>> Take[X, #] & /@ >>>>> Sort[First /@ >>>>> (Position[X, #] & /@ >>>>> Union[X])] // Flatten >>>>> >>>>> For example, >>>>> >>>>> In[2] := DeleteRepetitions[{3,1,2,3,3,2,4,1}] >>>>> >>>>> Out[2] = {3,1,2,4} >>>>> >>>>> In[3] := DeleteRepetitions[{b,a,b,a,c,a}] >>>>> >>>>> Out[3] = {b,a,c} >>>>> >>>>> I don't need to use this function on lists longer that 20 or so > elements, >>>>> so speed is not a critical concern. >>>>> >>>>> Also, my version happens to work on expressions with heads other than >>> List >>>>> (because Take, Position, Union, and Flatten all do so), but I don't >>> really >>>>> need that feature. >>>>> >>>>> How would you implement this function? >>>>> >>>> >>>> DeleteRepetitions1[x_List] := >>>> >>>> Take[x, #] & /@ >>>> >>>> Sort[First /@ >>>> (Position[x, #] & /@ >>>> Union[x])] // >>>> Flatten >>>> >>>> DeleteRepetitions2[x_List] := Module[{uniq = {}}, >>>> If[Not[MemberQ[uniq, #]], (uniq = Append[uniq, #])] & /@ x; >>>> uniq] >>>> >>>> DeleteRepetitions3[x_List] := >>>> Transpose[ >>>> Union[Transpose[Join[{Range[Length[x]]}, {x}]], >>>> SameTest -> (#1[[2]] == #2[[2]] &)]][[2]] >>>> >>>> DeleteRepetitions4[x_List] := x //. {a___, b_, c___, b_, d___} -> {a, > b, >>> c, d} >>>> >>>> DeleteRepetitions5[x_List] := >>>> Module[{uniq = Union[x], n, portion}, >>>> n = Length[uniq]; While[(Union[portion = Take[x, n]]) != uniq, n++]; >>>> portion //. {a___, b_, c___, b_, d___} -> {a, b, c, d}] >>>> >>>> DeleteRepetitions6[x_List] := >>>> Module[{uniq = Union[x], n, portion}, >>>> n = Length[uniq]; While[(Union[portion = Take[x, n]]) != uniq, n++]; >>>> Take[portion, #] & /@ >>>> Sort[First /@ >>>> (Position[portion, #] & /@ >>>> uniq)] // Flatten] >>>> >>>> testList = Table[Random[Integer, {1, 10}], {400}]; >>>> >>>> funcs = ToExpression[Names["DeleteRep*"]]; >>>> >>>> Demonstrating the equivalence of the functions >>>> >>>> Equal[Sequence @@ (#[testList] & /@ funcs)] >>>> >>>> True >>>> >>>> Comparing their timings >>>> >>>> Timing[#[testList]][[1]] & /@ funcs >>>> >>>> {0.01666666666665151*Second, 0.049999999999954525*Second, >>>> 0.10000000000002274*Second, 2.2999999999999545*Second, >>>> 0.016666666666765195*Second, 0.*Second} >>>> >>>> The fourth method is very slow for long lists. The fifth method speeds >>> this >>>> up provided that the list is highly redundant. The sixth method applies >>> this >>>> potential speed up to the first (original) method. >>>> >>>> Bob >>>> >>>> BobHanlon at aol.com >>>> >>> >>> >>> >> >