MathGroup Archive: December 2010 [00736]

[Date Index] [Thread Index] [Author Index]
Re: newbie list question
To: mathgroup at smc.vnet.net
Subject: [mg115035] Re: newbie list question
From: ADL <alberto.dilullo at tiscali.it>
Date: Wed, 29 Dec 2010 05:56:29 -0500 (EST)
References: <ifd11v$e2b$1@smc.vnet.net>
On 28 Dic, 16:51, Daniel Lichtblau <d... at wolfram.com> wrote:
> ----- Original Message -----
> > From: "Gareth Edwards" <gareth.edwa... at cubicmotion.com>
> > To: mathgr... at smc.vnet.net
> > Sent: Sunday, December 26, 2010 3:02:23 AM
> > Subject:  newbie list question
> > Hi,
>
> > Liking Mathematica a lot, but struggling with the early part of the
> > learning curve, i.e. don't know what I don't know...
>
> > What would be the neatest syntax for finding the first location of
> > elements from one list in another? For example:
>
> > listA = { 4,5,8,2,6,4 }
> > listB = { 8,4,2 }
>
> > I would like a function to return {3,1,4} in this case (the first
> > location in A of each element in B)
>
> > Many thanks!
>
> Neatest is subject to debate. A straightforward and simple method is to Map Position (of element in first list) over the second list.
>
> firstPositions1[l1_, l2_] := Flatten[Map[Position[l1, #, 1, 1] &, l2]]
>
> This will be fine provided the second list is not too large; it scales as m1*m2 where mj is the length of list j. A method that is slower per element of list2, but scales as O(m1+m2), is shown below.
>
> firstPositions2[l1_, l2_] := Module[
>   {stor, elem, tab},
>   tab = ConstantArray[0, Length[l2]];
>   MapIndexed[If[! IntegerQ[stor[#1]], stor[#1] = #2[[1]]] &, l2];
>   Do[
>    elem = stor[l1[[j]]];
>    If[IntegerQ[elem],
>     stor[l1[[j]]] = False;
>     tab[[elem]] = j;
>     ], {j, Length[l1]}];
>   tab]
>
> Quick test that these work.
>
> listA = {4, 5, 8, 2, 6, 4};
> listB = {8, 4, 2};
>
> In[85]:= firstPositions1[listA, listB] ==
>  firstPositions2[listA, listB] ==
>  IntegerDigits[IntegerPart[100*Pi]]
>
> Out[85]= True
>
> If the second list has 10 or so elements, the second method is noticeably slower than the first. But it does in fact scale much better. Here we compare speed when second list is 1000 elements (not necessarily unique).
>
> l1 = RandomInteger[10^6, 5*10^5];
> l2 = RandomInteger[10^6, 10^3];
>
> In[73]:= Timing[pos1 = firstPositions1[l1, l2];]
> Out[73]= {68.11000000000001, Null}
>
> In[74]:= Timing[pos2 = firstPositions2[l1, l2];]
> Out[74]= {1.933999999999969, Null}
>
> Notice there is a difference in behavior beyond speed. If the second list contains values not found in the first, the second result will have zeros in those corresponding slots. I'm guessing this is probably desirable behavior, as compared to omitting them altogether (since you won't otherwise know what element in result corresponds to what element in second list).
>
> In[75]:= pos1 == DeleteCases[pos2, 0]
> Out[75]= True
>
> In[76]:= Length[pos1]
> Out[76]= 379
>
> Daniel Lichtblau
> Wolfram Research

Starting from Lichtblau's method, I noticed that an even faster method
can be implemented by using rules.

I take as the reference method the following, based on Position:

firstPositions1[li1_, li2_] :=
 Flatten[Map[Position[li1, #, 1, 1] &, li2]]

The second method is Lichtblau's:

firstPositions2[l1_, l2_] := Module[
  {stor, elem, tab},
  tab = ConstantArray[0, Length[l2]];
  MapIndexed[If[! IntegerQ[stor[#1]], stor[#1] = #2[[1]]] &, l2];
  Do[
   elem = stor[l1[[j]]];
   If[IntegerQ[elem],
    stor[l1[[j]]] = False;
    tab[[elem]] = j;
    ], {j, Length[l1]}];
  tab
]

Than I define a method based on rules:

ClearAll[ListRules];
ListRules[li_]:=
  Dispatch[
    Append[
       MapIndexed[#1->First[#2]&,li],
       _Integer->0
    ]
  ];

firstPositions3A[li1_, li2_] := DeleteCases[li2 /. ListRules[li1], 0];
firstPositions3B[li1_, li2_] := li2 /. ListRules[li1];

The rules associate to each element in Li1 its position.
Then, the rules are used on each element in Li2, thus replacing them
by the corresponding position il Li1.
Since only the first rule which is met in the list is applied, only
the first position is used.
Elements without position are replaced by zero (last appended rule).
Dispatch has the only effect of making rule application faster.

The method is in two "flavours": the first deletes the zeros and the
second doesn't.

With the small lists in the original question, all the methods return
{3,1,4}.

Getting to bigger lists:

In[3]:= li1=RandomInteger[10^4,10^4];
li2=RandomInteger[10^4,10^4];

In[5]:= Timing[pos1=firstPositions1[li1,li2];]
Out[5]= {17.596,Null}

In[6]:= Timing[pos2=firstPositions2[li1,li2];]
Out[6]= {0.218,Null}

In[7]:= Timing[pos3A=firstPositions3A[li1,li2];]
Out[7]= {0.078,Null}

In[8]:= Timing[pos3B=firstPositions3B[li1,li2];]
Out[8]= {0.094,Null}

So, the method is faster, both with and without zeros.

It returns results equal to the Position method:

In[10]:= pos1==pos3A
Out[10]= True

Instead, with the zeros, it is a little different from Lichtblau's:
In[13]:= pos2==pos3B
Out[13]= False

In fact, the rule method associates the position to any repeated
element in list2, while firstPositions2 does it only for the one
appearing first:
li1 = {4,5,8,2,6,4};
li2 = {8,4,2,8,7};

In[17]:= firstPositions1[li1,li2]
Out[17]= {3,1,4,3}
In[18]:= firstPositions2[li1,li2]
Out[18]= {3,1,4,0,0}
In[19]:= firstPositions3A[li1,li2]
Out[19]= {3,1,4,3}
In[20]:= firstPositions3B[li1,li2]
Out[20]= {3,1,4,3,0}

I think firstPositions3B represent better what one might expect with
respect to the zeros, but it of course depends on the application.


ADL
Prev by Date: Re: newbie list question
Next by Date: Re: typesetting problems or bugs? need a professional stylesheet
Previous by thread: Re: newbie list question
Next by thread: Re: newbie list question