MathGroup Archive: December 2010 [00791]

[Date Index] [Thread Index] [Author Index]
Re: newbie list question
To: mathgroup at smc.vnet.net
Subject: [mg115100] Re: newbie list question
From: Ray Koopman <koopman at sfu.ca>
Date: Thu, 30 Dec 2010 19:06:11 -0500 (EST)
As I said in my Dec 29 post, firstPositions2 ignores repeats in 
the second list. firstPositions3 fixes the problem but is slower.

Length@Union[l1 = Table[Random[Integer,{1, 10^6}], {5*10^5}]]
Length@Union[l2 = Table[Random[Integer,{1, 10^4}], {  10^3}]]
Timing@Length[r1 = firstPositions2[l1, l2]]
Timing@Length[r2 = firsts         [l1, l2]]
Timing@Length[r3 = firstPositions3[l1, l2]]
{r1 === r2, r1 === r3, r2 === r3}

393473
952
{3.92 Second,1000}
{0.48 Second,1000}
{8.53 Second,1000}
{False,False,True}

Length@Union[l1 = Table[Random[Integer,{1, 10^6}], {5*10^5}]]
Length@Union[l2 = Table[Random[Integer,{1, 10^6}], {  10^3}]]
Timing@Length[r1 = firstPositions2[l1, l2]]
Timing@Length[r2 = firsts         [l1, l2]]
Timing@Length[r3 = firstPositions3[l1, l2]]
{r1 === r2, r1 === r3, r2 === r3}

393371
1000
{3.92 Second,1000}
{0.49 Second,1000}
{8.1 Second,1000}
{True,True,True}

----- DrMajorBob <btreat1 at austin.rr.com> wrote:
> firsts[l1_, l2_] :=  Normal[SparseArray[Automatic, {Max[l1,l2]},
>    0, {1, {{0, Length[l1]}, List /@ l1}, Range[Length[l1]]}][[l2]]]
> 
> l1 = RandomInteger[{1, 10^6}, 5*10^5];
> l2 = RandomInteger[{1, 10^4}, 10^3];
> r1 = firstPositions2[l1, l2]; // Timing
> r2 = firsts[l1, l2]; // Timing
> r1 === r2
> 
> {1.33669, Null}
> 
> {0.089459, Null}
> 
> False
> 
> Bobby
> 
> On Thu, 30 Dec 2010 03:12:17 -0600, Ray Koopman <koopman at sfu.ca> wrote:
>> At 3:00 am on Dec 29, 2010, Carl Woll <carlw at wolfram.com> wrote:
>>> On 12/28/2010 5:53 AM, Daniel Lichtblau wrote:
>>>> ----- Original Message -----
>>>>
>>>>> From: "Gareth Edwards"<gareth.edwards at cubicmotion.com>
>>>>> To: mathgroup at smc.vnet.net
>>>>> Sent: Sunday, December 26, 2010 3:02:23 AM
>>>>> Subject:  newbie list question
>>>>> Hi,
>>>>>
>>>>> Liking Mathematica a lot, but struggling with the early part of
>>>>> the learning curve, i.e. don't know what I don't know...
>>>>>
>>>>> What would be the neatest syntax for finding the first location
>>>>> of elements from one list in another? For example:
>>>>>
>>>>> listA = { 4,5,8,2,6,4 }
>>>>> listB = { 8,4,2 }
>>>>>
>>>>> I would like a function to return {3,1,4} in this case (the first
>>>>> location in A of each element in B)
>>>>>
>>>>> Many thanks!
>>>>
>>>> Neatest is subject to debate. A straightforward and simple method
>>>> is to Map Position (of element in first list) over the second list.
>>>>
>>>> firstPositions1[l1_, l2_] := Flatten[Map[Position[l1, #, 1, 1]&, l2]]
>>>>
>>>> This will be fine provided the second list is not too large; it
>>>> scales as m1*m2 where mj is the length of list j. A method that is
>>>> slower per element of list2, but scales as O(m1+m2), is shown below.
>>>>
>>>> firstPositions2[l1_, l2_] := Module[
>>>>    {stor, elem, tab},
>>>>    tab = ConstantArray[0, Length[l2]];
>>>>    MapIndexed[If[! IntegerQ[stor[#1]], stor[#1] = #2[[1]]]&, l2];
>>>>    Do[
>>>>     elem = stor[l1[[j]]];
>>>>     If[IntegerQ[elem],
>>>>      stor[l1[[j]]] = False;
>>>>      tab[[elem]] = j;
>>>>      ], {j, Length[l1]}];
>>>>    tab]
>>>>
>>>> Quick test that these work.
>>>
>>>> listA = {4, 5, 8, 2, 6, 4};
>>>> listB = {8, 4, 2};
>>>>
>>>> In[85]:= firstPositions1[listA, listB] ==
>>>>   firstPositions2[listA, listB] ==
>>>>   IntegerDigits[IntegerPart[100*Pi]]
>>>>
>>>> Out[85]= True
>>>>
>>>> If the second list has 10 or so elements, the second method is
>>>> noticeably slower than the first. But it does in fact scale much
>>>> better. Here we compare speed when second list is 1000 elements
>>>> (not necessarily unique).
>>>>
>>>> l1 = RandomInteger[10^6, 5*10^5];
>>>> l2 = RandomInteger[10^6, 10^3];
>>>>
>>>> In[73]:= Timing[pos1 = firstPositions1[l1, l2];]
>>>> Out[73]= {68.11000000000001, Null}
>>>>
>>>> In[74]:= Timing[pos2 = firstPositions2[l1, l2];]
>>>> Out[74]= {1.933999999999969, Null}
>>>>
>>>> Notice there is a difference in behavior beyond speed. If the second
>>>> list contains values not found in the first, the second result will
>>>> have zeros in those corresponding slots. I'm guessing this is
>>>> probably desirable behavior, as compared to omitting them altogether
>>>> (since you won't otherwise know what element in result corresponds
>>>> to what element in second list).
>>>>
>>>> In[75]:= pos1 == DeleteCases[pos2, 0]
>>>> Out[75]= True
>>>>
>>>> In[76]:= Length[pos1]
>>>> Out[76]= 379
>>>>
>>>> Daniel Lichtblau
>>>> Wolfram Research
>>>
>>> If we assume that all of the list elements are positive integers,
>>> then the following is much faster:
>>>
>>> firsts[l1_, l2_] :=  Normal[SparseArray[Automatic, {Max[l1]}, 0,
>>> {1, {{0, Length[l1]}, List /@ l1}, Range[Length[l1]]}][[l2]]]
>>>
>>> Example:
>>>
>>> l1 = RandomInteger[{1, 10^6}, 5*10^5];
>>> l2 = RandomInteger[{1, 10^6}, 10^3];
>>>
>>> In[130]:= r1 = firstPositions2[l1, l2]; // Timing
>>> r2 = firsts[l1, l2]; // Timing
>>> r1 === r2
>>>
>>> Out[130]= {2.387, Null}
>>>
>>> Out[131]= {0.109, Null}
>>>
>>> Out[132]= True
>>>
>>> Carl Woll
>>> Wolfram Research
>>
>> If Max[l2] can be > Max[l1] then we need
>>
>>  firsts[l1_, l2_] :=  Normal[SparseArray[Automatic, {Max[l1,l2]},
>>   0, {1, {{0, Length[l1]}, List /@ l1}, Range[Length[l1]]}][[l2]]]
> 
> -- 
> DrMajorBob at yahoo.com
Prev by Date: Re: newbie list question
Next by Date: Style Sheet Font Family Not Applied to Standard Form Input Cell
Previous by thread: Re: newbie list question
Next by thread: Colored graph isomorphism?