RE: Re: irritating little problem

*To*: mathgroup at smc.vnet.net*Subject*: [mg32968] RE: [mg32926] Re: irritating little problem*From*: "Wolf, Hartmut" <Hartmut.Wolf at t-systems.com>*Date*: Fri, 22 Feb 2002 01:48:51 -0500 (EST)*Sender*: owner-wri-mathgroup at wolfram.com

please see below: > -----Original Message----- > From: Allan Hayes [SMTP:hay at haystack.demon.co.uk] To: mathgroup at smc.vnet.net > Sent: Wednesday, February 20, 2002 7:26 AM > To: mathgroup at smc.vnet.net > Subject: [mg32926] Re: irritating little problem > > Peter, > One immediately thinks of using Position as in the attempt 1 below, but > this > involves a search of the list for every one of its values and is clearly > inefficent. > Much better is to attach the positions to the elements of the list and > then > manipulate the result - attempt 2. > > To get timings I use > > n=500 > vec = Table[Random[Integer, {0, n}], {10n}]; > > > attempt 1: > > pn1=(#->Position[vec,#])&/@Union[vec];//Timing > > {17.63 Second,Null} > > 0/.pn1 > > {{342},{581},{2162},{2281},{2315},{3051},{3071},{3239},{3351},{4425}} > > attempt2: > > pn4=(#[[1,1]]->#[[All,2]]&/@ > Split[Sort[ > Transpose[{vec, > Range[Length[vec]]}]],#1[[1]] == #2[[1]]&]);//Timing > > {0.94 Second,Null} > > 0/.pn4 > > {342,581,2162,2281,2315,3051,3071,3239,3351,4425} > > - The inner lists in attempt 1 can be removed by flattening the answer. > - We could have attached the positions to the elements in attempt 1 by > using > MapIndexed, but I think that this is slower. > - The advantage of attempt 1 increases with the size of n. > > -- > Allan > > --------------------- > Allan Hayes > Mathematica Training and Consulting > Leicester UK > www.haystack.demon.co.uk > hay at haystack.demon.co.uk > Voice: +44 (0)116 271 4198 > Fax: +44 (0)870 164 0565 > > > "KIMIC Weijnitz Peter" <micweij at eka.ericsson.se> wrote in message > news:a4sv9c$ido$1 at smc.vnet.net... > > I have a simple vector > > and I want to find the position of elements that are equal. > > > > I.e I want to test the vector and find all cases of similar elements. > > > > Brute force is not what I want, it can be a long vector. > > Best regards > > Petr W > > > [Hartmut Wolf] Dear Allan, your solution "attempt 2" is of universal importance. It reveals two important observations: (1) if there are repetions in a list, then mapping over the sorted and split list can be much faster than mapping over the original list (since it is shorter by a factor). The extra operations Range, Transpose, Sort and Split are extremly fast compared to Map, since the are deeply buried into the kernel and do not reenter expression evaluation, as Map has to do when applying the function to the argument. (2) although it introduces an (n log n) component (Sort), this is of no importance even for large lists within the confines of real memory of an average workstation. I compared your solution with a "MapIndexed" variant you mentioned (but did not show). Furthermore , I tried and succeeded in avoiding the performance penalty on Split when the test function is applied. This gives another performance boost, which adds to the value of your "design pattern". In[1]:= SeedRandom[1] In[2]:= n = 500;m = 10; vec = Table[Random[Integer, {0, n}], {m n}]; Your attempt 2 (repeated here for convenience): In[4]:= pn4 = (#[[1, 1]] -> #[[All, 2]] &) /@ Split[Sort[ Transpose[{vec, Range[Length[vec]]}]], #1[[1]] == #2[[1]] &]; // Timing Out[4]= {0.621 Second, Null} In[5]:= 0 /. pn4 Out[5]= {1015, 1449, 1520, 2487, 4072} a "MapIndexed" method In[6]:= Remove[p] In[7]:= p[_] := {} In[8]:= MapIndexed[((p[#1] = {p[#1], #2}) &), vec]; // Timing Out[8]= {1.092 Second, Null} In[9]:= pn5 = n_ :> Flatten[p[n]]; In[10]:= 0 /. pn5 Out[10]= {1015, 1449, 1520, 2487, 4072} "attempt 6", improves pn4: In[11]:= pn6 = Module[{sortvec, posns, splitvec, starts}, {sortvec, posns} = Transpose[ Sort[Transpose[{vec, Range[Length[vec]]}]]]; starts = FoldList[#1 + Length[#2] &, 1, splitvec = Split[sortvec]]; MapThread[(First[#1] -> Take[posns, #2] &), {splitvec, Drop[Transpose[{starts, RotateLeft[starts - 1]}], -1]} ]]; // Timing Out[11]= {0.18 Second, Null} In[12]:= 0 /. pn6 Out[12]= {1015, 1449, 1520, 2487, 4072} Of course the Sort/Split methods will loose there advantage when there are only a few repetions. To test that, I compared timings on different probes (for n and m respectively): In[1]:= s = 5^Range[0, 6]; In[2]:= probes = Transpose[{s, Reverse[s]}] Out[2]= {{1, 15625}, {5, 3125}, {25, 625}, {125, 125}, {625, 25}, {3125, 5}, {15625, 1}} I'll just show my timing results: t4 = {1.492, 1.442, 1.432, 1.452, 1.593, 1.903, 2.784} t5 = {2.904, 3.786, 3.115, 4.406, 3.495, 5.368, 4.968} t6 = {0.191, 0.23, 0.23, 0.251, 0.391, 0.902, 2.323} It's quite illuminating to compare them in a graph: << Graphics`MultipleListPlot` MultipleListPlot[t4, t5, t6, PlotJoined -> True] -- Hartmut