Re: newbie list question
- To: mathgroup at smc.vnet.net
- Subject: [mg115074] Re: newbie list question
- From: DrMajorBob <btreat1 at austin.rr.com>
- Date: Thu, 30 Dec 2010 04:11:32 -0500 (EST)
Here's a faster method that agrees with Daniel's firstPositions1 and Alberto's firstPositions3A. Clear[firstPositionFinder] firstPositionFinder[list1_List] := Dispatch@Append[Thread[list1 -> Range@Length@list1], Blank[] :> Sequence[]] firstPositionFinder[list1_List, list2_List] := Replace[list2, firstPositionFinder[list1], 1] I was surprised to find that Dispatch makes the function 100 times faster, even when the Dispatch table is used only ONCE. li1 = RandomInteger[10^4, 10^4]; li2 = RandomInteger[10^4, 10^4]; Timing[pos1 = firstPositions1[li1, li2];] {9.95975, Null} Timing[pos2 = firstPositions2[li1, li2];] {0.117774, Null} Timing[pos3A = firstPositions3A[li1, li2];] {0.046426, Null} Timing[pos3B = firstPositions3B[li1, li2];] {0.043821, Null} Timing[treat = firstPositionFinder[li1, li2];] {0.028128, Null} treat == pos3A == pos1 True Building the Dispatch table is slower than Replace, so reusing it can be very fast: Timing[dispatch = firstPositionFinder[li1];] Timing[treat2 = Replace[li2, dispatch, 1];] treat == treat2 {0.018325, Null} {0.013083, Null} True Timing@Do[Replace[RandomInteger[10^4, 10^4], dispatch, 1], {100}] {0.795038, Null} Part of that is time spent in Do and RandomInteger: Timing@Do[RandomInteger[10^4, 10^4], {100}] {0.01691, Null} Replacing Sequence[] with 0 gives a similar method that leaves 0s for elements that do not appear in the first list. Bobby On Wed, 29 Dec 2010 04:56:29 -0600, ADL <alberto.dilullo at tiscali.it> wrote: > On 28 Dic, 16:51, Daniel Lichtblau <d... at wolfram.com> wrote: >> ----- Original Message ----- >> > From: "Gareth Edwards" <gareth.edwa... at cubicmotion.com> >> > To: mathgr... at smc.vnet.net >> > Sent: Sunday, December 26, 2010 3:02:23 AM >> > Subject: newbie list question >> > Hi, >> >> > Liking Mathematica a lot, but struggling with the early part of the >> > learning curve, i.e. don't know what I don't know... >> >> > What would be the neatest syntax for finding the first location of >> > elements from one list in another? For example: >> >> > listA = { 4,5,8,2,6,4 } >> > listB = { 8,4,2 } >> >> > I would like a function to return {3,1,4} in this case (the first >> > location in A of each element in B) >> >> > Many thanks! >> >> Neatest is subject to debate. A straightforward and simple method is to >> Map Position (of element in first list) over the second list. >> >> firstPositions1[l1_, l2_] := Flatten[Map[Position[l1, #, 1, 1] &, l2]] >> >> This will be fine provided the second list is not too large; it scales >> as m1*m2 where mj is the length of list j. A method that is slower per >> element of list2, but scales as O(m1+m2), is shown below. >> >> firstPositions2[l1_, l2_] := Module[ >> {stor, elem, tab}, >> tab = ConstantArray[0, Length[l2]]; >> MapIndexed[If[! IntegerQ[stor[#1]], stor[#1] = #2[[1]]] &, l2]; >> Do[ >> elem = stor[l1[[j]]]; >> If[IntegerQ[elem], >> stor[l1[[j]]] = False; >> tab[[elem]] = j; >> ], {j, Length[l1]}]; >> tab] >> >> Quick test that these work. >> >> listA = {4, 5, 8, 2, 6, 4}; >> listB = {8, 4, 2}; >> >> In[85]:= firstPositions1[listA, listB] == >> firstPositions2[listA, listB] == >> IntegerDigits[IntegerPart[100*Pi]] >> >> Out[85]= True >> >> If the second list has 10 or so elements, the second method is >> noticeably slower than the first. But it does in fact scale much >> better. Here we compare speed when second list is 1000 elements (not >> necessarily unique). >> >> l1 = RandomInteger[10^6, 5*10^5]; >> l2 = RandomInteger[10^6, 10^3]; >> >> In[73]:= Timing[pos1 = firstPositions1[l1, l2];] >> Out[73]= {68.11000000000001, Null} >> >> In[74]:= Timing[pos2 = firstPositions2[l1, l2];] >> Out[74]= {1.933999999999969, Null} >> >> Notice there is a difference in behavior beyond speed. If the second >> list contains values not found in the first, the second result will >> have zeros in those corresponding slots. I'm guessing this is probably >> desirable behavior, as compared to omitting them altogether (since you >> won't otherwise know what element in result corresponds to what element >> in second list). >> >> In[75]:= pos1 == DeleteCases[pos2, 0] >> Out[75]= True >> >> In[76]:= Length[pos1] >> Out[76]= 379 >> >> Daniel Lichtblau >> Wolfram Research > > Starting from Lichtblau's method, I noticed that an even faster method > can be implemented by using rules. > > I take as the reference method the following, based on Position: > > firstPositions1[li1_, li2_] := > Flatten[Map[Position[li1, #, 1, 1] &, li2]] > > The second method is Lichtblau's: > > firstPositions2[l1_, l2_] := Module[ > {stor, elem, tab}, > tab = ConstantArray[0, Length[l2]]; > MapIndexed[If[! IntegerQ[stor[#1]], stor[#1] = #2[[1]]] &, l2]; > Do[ > elem = stor[l1[[j]]]; > If[IntegerQ[elem], > stor[l1[[j]]] = False; > tab[[elem]] = j; > ], {j, Length[l1]}]; > tab > ] > > Than I define a method based on rules: > > ClearAll[ListRules]; > ListRules[li_]:= > Dispatch[ > Append[ > MapIndexed[#1->First[#2]&,li], > _Integer->0 > ] > ]; > > firstPositions3A[li1_, li2_] := DeleteCases[li2 /. ListRules[li1], 0]; > firstPositions3B[li1_, li2_] := li2 /. ListRules[li1]; > > The rules associate to each element in Li1 its position. > Then, the rules are used on each element in Li2, thus replacing them > by the corresponding position il Li1. > Since only the first rule which is met in the list is applied, only > the first position is used. > Elements without position are replaced by zero (last appended rule). > Dispatch has the only effect of making rule application faster. > > The method is in two "flavours": the first deletes the zeros and the > second doesn't. > > With the small lists in the original question, all the methods return > {3,1,4}. > > Getting to bigger lists: > > In[3]:= li1=RandomInteger[10^4,10^4]; > li2=RandomInteger[10^4,10^4]; > > In[5]:= Timing[pos1=firstPositions1[li1,li2];] > Out[5]= {17.596,Null} > > In[6]:= Timing[pos2=firstPositions2[li1,li2];] > Out[6]= {0.218,Null} > > In[7]:= Timing[pos3A=firstPositions3A[li1,li2];] > Out[7]= {0.078,Null} > > In[8]:= Timing[pos3B=firstPositions3B[li1,li2];] > Out[8]= {0.094,Null} > > So, the method is faster, both with and without zeros. > > It returns results equal to the Position method: > > In[10]:= pos1==pos3A > Out[10]= True > > Instead, with the zeros, it is a little different from Lichtblau's: > In[13]:= pos2==pos3B > Out[13]= False > > In fact, the rule method associates the position to any repeated > element in list2, while firstPositions2 does it only for the one > appearing first: > li1 = {4,5,8,2,6,4}; > li2 = {8,4,2,8,7}; > > In[17]:= firstPositions1[li1,li2] > Out[17]= {3,1,4,3} > In[18]:= firstPositions2[li1,li2] > Out[18]= {3,1,4,0,0} > In[19]:= firstPositions3A[li1,li2] > Out[19]= {3,1,4,3} > In[20]:= firstPositions3B[li1,li2] > Out[20]= {3,1,4,3,0} > > I think firstPositions3B represent better what one might expect with > respect to the zeros, but it of course depends on the application. > > > ADL > -- DrMajorBob at yahoo.com