Re: newbie list question

• To: mathgroup at smc.vnet.net
• Subject: [mg115074] Re: newbie list question
• From: DrMajorBob <btreat1 at austin.rr.com>
• Date: Thu, 30 Dec 2010 04:11:32 -0500 (EST)

```Here's a faster method that agrees with Daniel's firstPositions1 and
Alberto's firstPositions3A.

Clear[firstPositionFinder]
firstPositionFinder[list1_List] :=
Blank[] :> Sequence[]]
firstPositionFinder[list1_List, list2_List] :=
Replace[list2, firstPositionFinder[list1], 1]

I was surprised to find that Dispatch makes the function 100 times faster,
even when the Dispatch table is used only ONCE.

li1 = RandomInteger[10^4, 10^4];
li2 = RandomInteger[10^4, 10^4];

Timing[pos1 = firstPositions1[li1, li2];]

{9.95975, Null}

Timing[pos2 = firstPositions2[li1, li2];]

{0.117774, Null}

Timing[pos3A = firstPositions3A[li1, li2];]

{0.046426, Null}

Timing[pos3B = firstPositions3B[li1, li2];]

{0.043821, Null}

Timing[treat = firstPositionFinder[li1, li2];]

{0.028128, Null}

treat == pos3A == pos1

True

Building the Dispatch table is slower than Replace, so reusing it can be
very fast:

Timing[dispatch = firstPositionFinder[li1];]
Timing[treat2 = Replace[li2, dispatch, 1];]
treat == treat2

{0.018325, Null}

{0.013083, Null}

True

Timing@Do[Replace[RandomInteger[10^4, 10^4], dispatch, 1], {100}]

{0.795038, Null}

Part of that is time spent in Do and RandomInteger:

Timing@Do[RandomInteger[10^4, 10^4], {100}]

{0.01691, Null}

Replacing Sequence[] with 0 gives a similar method that leaves 0s for
elements that do not appear in the first list.

Bobby

On Wed, 29 Dec 2010 04:56:29 -0600, ADL <alberto.dilullo at tiscali.it> wrote:

> On 28 Dic, 16:51, Daniel Lichtblau <d... at wolfram.com> wrote:
>> ----- Original Message -----
>> > From: "Gareth Edwards" <gareth.edwa... at cubicmotion.com>
>> > To: mathgr... at smc.vnet.net
>> > Sent: Sunday, December 26, 2010 3:02:23 AM
>> > Subject:  newbie list question
>> > Hi,
>>
>> > Liking Mathematica a lot, but struggling with the early part of the
>> > learning curve, i.e. don't know what I don't know...
>>
>> > What would be the neatest syntax for finding the first location of
>> > elements from one list in another? For example:
>>
>> > listA = { 4,5,8,2,6,4 }
>> > listB = { 8,4,2 }
>>
>> > I would like a function to return {3,1,4} in this case (the first
>> > location in A of each element in B)
>>
>> > Many thanks!
>>
>> Neatest is subject to debate. A straightforward and simple method is to
>> Map Position (of element in first list) over the second list.
>>
>> firstPositions1[l1_, l2_] := Flatten[Map[Position[l1, #, 1, 1] &, l2]]
>>
>> This will be fine provided the second list is not too large; it scales
>> as m1*m2 where mj is the length of list j. A method that is slower per
>> element of list2, but scales as O(m1+m2), is shown below.
>>
>> firstPositions2[l1_, l2_] := Module[
>>   {stor, elem, tab},
>>   tab = ConstantArray[0, Length[l2]];
>>   MapIndexed[If[! IntegerQ[stor[#1]], stor[#1] = #2[[1]]] &, l2];
>>   Do[
>>    elem = stor[l1[[j]]];
>>    If[IntegerQ[elem],
>>     stor[l1[[j]]] = False;
>>     tab[[elem]] = j;
>>     ], {j, Length[l1]}];
>>   tab]
>>
>> Quick test that these work.
>>
>> listA = {4, 5, 8, 2, 6, 4};
>> listB = {8, 4, 2};
>>
>> In[85]:= firstPositions1[listA, listB] ==
>>  firstPositions2[listA, listB] ==
>>  IntegerDigits[IntegerPart[100*Pi]]
>>
>> Out[85]= True
>>
>> If the second list has 10 or so elements, the second method is
>> noticeably slower than the first. But it does in fact scale much
>> better. Here we compare speed when second list is 1000 elements (not
>> necessarily unique).
>>
>> l1 = RandomInteger[10^6, 5*10^5];
>> l2 = RandomInteger[10^6, 10^3];
>>
>> In[73]:= Timing[pos1 = firstPositions1[l1, l2];]
>> Out[73]= {68.11000000000001, Null}
>>
>> In[74]:= Timing[pos2 = firstPositions2[l1, l2];]
>> Out[74]= {1.933999999999969, Null}
>>
>> Notice there is a difference in behavior beyond speed. If the second
>> list contains values not found in the first, the second result will
>> have zeros in those corresponding slots. I'm guessing this is probably
>> desirable behavior, as compared to omitting them altogether (since you
>> won't otherwise know what element in result corresponds to what element
>> in second list).
>>
>> In[75]:= pos1 == DeleteCases[pos2, 0]
>> Out[75]= True
>>
>> In[76]:= Length[pos1]
>> Out[76]= 379
>>
>> Daniel Lichtblau
>> Wolfram Research
>
> Starting from Lichtblau's method, I noticed that an even faster method
> can be implemented by using rules.
>
> I take as the reference method the following, based on Position:
>
> firstPositions1[li1_, li2_] :=
>  Flatten[Map[Position[li1, #, 1, 1] &, li2]]
>
> The second method is Lichtblau's:
>
> firstPositions2[l1_, l2_] := Module[
>   {stor, elem, tab},
>   tab = ConstantArray[0, Length[l2]];
>   MapIndexed[If[! IntegerQ[stor[#1]], stor[#1] = #2[[1]]] &, l2];
>   Do[
>    elem = stor[l1[[j]]];
>    If[IntegerQ[elem],
>     stor[l1[[j]]] = False;
>     tab[[elem]] = j;
>     ], {j, Length[l1]}];
>   tab
> ]
>
> Than I define a method based on rules:
>
> ClearAll[ListRules];
> ListRules[li_]:=
>   Dispatch[
>     Append[
>        MapIndexed[#1->First[#2]&,li],
>        _Integer->0
>     ]
>   ];
>
> firstPositions3A[li1_, li2_] := DeleteCases[li2 /. ListRules[li1], 0];
> firstPositions3B[li1_, li2_] := li2 /. ListRules[li1];
>
> The rules associate to each element in Li1 its position.
> Then, the rules are used on each element in Li2, thus replacing them
> by the corresponding position il Li1.
> Since only the first rule which is met in the list is applied, only
> the first position is used.
> Elements without position are replaced by zero (last appended rule).
> Dispatch has the only effect of making rule application faster.
>
> The method is in two "flavours": the first deletes the zeros and the
> second doesn't.
>
> With the small lists in the original question, all the methods return
> {3,1,4}.
>
> Getting to bigger lists:
>
> In[3]:= li1=RandomInteger[10^4,10^4];
> li2=RandomInteger[10^4,10^4];
>
> In[5]:= Timing[pos1=firstPositions1[li1,li2];]
> Out[5]= {17.596,Null}
>
> In[6]:= Timing[pos2=firstPositions2[li1,li2];]
> Out[6]= {0.218,Null}
>
> In[7]:= Timing[pos3A=firstPositions3A[li1,li2];]
> Out[7]= {0.078,Null}
>
> In[8]:= Timing[pos3B=firstPositions3B[li1,li2];]
> Out[8]= {0.094,Null}
>
> So, the method is faster, both with and without zeros.
>
> It returns results equal to the Position method:
>
> In[10]:= pos1==pos3A
> Out[10]= True
>
> Instead, with the zeros, it is a little different from Lichtblau's:
> In[13]:= pos2==pos3B
> Out[13]= False
>
> In fact, the rule method associates the position to any repeated
> element in list2, while firstPositions2 does it only for the one
> appearing first:
> li1 = {4,5,8,2,6,4};
> li2 = {8,4,2,8,7};
>
> In[17]:= firstPositions1[li1,li2]
> Out[17]= {3,1,4,3}
> In[18]:= firstPositions2[li1,li2]
> Out[18]= {3,1,4,0,0}
> In[19]:= firstPositions3A[li1,li2]
> Out[19]= {3,1,4,3}
> In[20]:= firstPositions3B[li1,li2]
> Out[20]= {3,1,4,3,0}
>
> I think firstPositions3B represent better what one might expect with
> respect to the zeros, but it of course depends on the application.
>
>