Re: Need to Speed up Position[]
- To: mathgroup at smc.vnet.net
- Subject: [mg110685] Re: Need to Speed up Position[]
- From: Peter Pein <petsie at dordos.net>
- Date: Fri, 2 Jul 2010 02:56:18 -0400 (EDT)
- References: <i0i1kr$hq6$1@smc.vnet.net>
Am Thu, 1 Jul 2010 12:28:11 +0000 (UTC)
schrieb Garapata <warsaw95826 at mypacks.net>:
> I have a large nested list, "myList"
>
> It has 3 sublists with the following dimensions:
>
> Dimensions/@ myList
>
> {{19808, 5}, {7952, 5}, {7952, 5}}
>
> The 5th position (i.e., column) in each of the sublists has
> SQLDateTime[]s
> (This may or may not affect what I need, but I thought everyone should
> know).
>
> myIntersection = Intersection @@ (myList[[All, All, 5]]);
>
> gives me the SQLDateTimes[]s common to all sublists. I get 3954
> common elements.
>
> Length[myIntersection]
>
> 3954
>
> All of the above works great and runs very fast.
>
> I then find the positions in myList where all the common
> SQLDateTimes[]s occur and then use Extract pull them out into a new
> list
>
> myPositions = Drop[(Position[data, #] & /@ myIntersection),
> None, None, -1];
>
> myOutput = Extract[myList, #] & /@ myPositions;
>
> I end up with just what I need, which in this case gives me 3954 rows
> of {9, 5} sublists. This occurs because myList[[1]] has 5 occurrences
> of each common date element and sublists myList[[2]] and myList[[3]]
> each have 2 occurrences of each common date element.
>
> The Extract[] runs very fast.
>
> My problem =85. the Position[] runs very very slow (over 90 seconds
> on a dual core iMac).
>
> All the code together:
>
> myIntersection = Intersection @@ (myList[[All, All, 5]]);
> myPositions = Drop[(Position[data, #] & /@ myIntersection), None,
> None, -1];
> myOutput = Extract[myList, #] & /@ myPositions;
>
> So, does anyone know a way to speed up:
>
> myPositions = Drop[(Position[data, #] & /@ myIntersection), None,
> None, -1]; ?
>
> Or can anyone suggest another approach to doing this that could run
> faster.
>
> Patterns?
> ParallelMap?
> Parallelize?
> Sorting?
> Changing SQLDateTimes to DateList[]s before calculating myPositions?
>
> Not clear what to try.
> Please advise.
>
> Thanks.
>
Hi,
if you are interested in myOutput only and do not need to keep
myPositions for later use, you can try something like:
In[1]:=
myList=RandomInteger[{1,5555},#]&/@{{19808,5},{7952,5},{7952,5}};
In[2]:= Length[myIntersection=Intersection@@myList[[All,All,5]]]
Out[2]= 3126
In[3]:=
Timing[Dimensions[
myOutput=Split[Sort[Cases[myList,{___,Alternatives@@myIntersection},{2}],Last[#1]<Last[#2]&],Last[#1]===Last[#2]&]
]]
Out[3]= {11.28,{3126}}
In[4]:= myOutput[[1]]
Out[4]=
{{3830,4047,4200,3520,1},{4788,4153,2710,2938,1},{886,2560,5266,128,1},{143,218,3189,3672,1},{190,510,4701,212,1}}
Peter