[Date Index]
[Thread Index]
[Author Index]
Re: Need to Speed up Position[]
*To*: mathgroup at smc.vnet.net
*Subject*: [mg110693] Re: Need to Speed up Position[]
*From*: Leonid Shifrin <lshifr at gmail.com>
*Date*: Fri, 2 Jul 2010 07:26:21 -0400 (EDT)
Hi,
I suggest you to use some of the functionality of the package
UnsortedOperations which I developed some time ago, and which is available
at:
http://www.mathprogramming-intro.or/additional_resources.html
1. Download the package by right-clicking on link "The package" in the
section "UnsortedOperations" and "Save as".
2. Evaluate $UserBaseDirectory in Mathematica to locate your application
directory
3. Find the folder Applications in it, and place there the package.
Then:
This loads the package as a sub-context of a Global` context (ignore the
error message)
Needs["UnsortedOperations`"]
Needs::nocont: Context UnsortedOperations` was not created when Needs was
evaluated. >>
You can call the Information (?) to see the name of the functions:
?`UnsortedOperations`*
Global`UnsortedOperations`
MapAtComplement MemberPositions PositionsOfDifferent
UnsortedComplement
MapAtIntersection MemberPositionsSequences PositionsOfSame
UnsortedIntersection
MapAtMembers MemberSequences Subsequences UnsortedUnion
You will need the <PositionOfSame> function. You can type ?PositionsOfSame
to get the usage message explaining what it does. Now:
In[3]:=
myList=RandomInteger[{1,5555},#]&/@{{19808,5},{7952,5},{7952,5}};
In[5]:= (pos = PositionsOfSame@@myList[[All,All,5]])//Short//Timing
Out[5]=
{0.991,{{{3940,5213,5845},{290,5492},{145,3142,5936,6987}},{{7151},{443},{7392}},<<3108>>,{{5697},{3910},{4057}}}}
Each sublist contains lists of positions of some element in the first,
second and third lists. There can be more than one identical element in each
list, therefore position lists referring to each of the sublists in <myList>
may have different lengths. Here we check for an arbitrary element (date)
present in all lists that these positions are consistent:
In[6]:=
someElemPositions = RandomChoice[pos]
Out[6]= {{224,1002,7634,18556},{4167,6157},{3662}}
In[7]:= myList[[1,someElemPositions[[1]]]]
Out[7]= {{4762,3324,804,1066,4935},
{4159,2940,4132,3993,4935},{2997,5423,228,2728,4935},{4389,1814,2097,890,4935}}
In[8]:= myList[[2,someElemPositions[[2]]]]
Out[8]= {{3717,635,5271,4565,4935},{1181,2864,3431,5117,4935}}
In[9]:= myList[[3,someElemPositions[[3]]]]
Out[9]= {{2026,4835,3932,3738,4935}}
This will regroup positions according to the original order of elements in
the list:
In[12]:= (positionsAccordingToOrder =
(Sort[Flatten[#]]&/@Transpose[pos]))//Shallow
Out[12]//Shallow=
{{1,2,4,5,6,9,10,11,12,13,<<11388>>},{1,3,6,8,9,10,13,14,15,16,<<5774>>},{1,2,3,4,5,6,8,9,10,11,<<5872>>}}
This will extract from your list only parts which are present in all 3
lists.
In[14]:= (listWithCOmmonElementsOnly =
MapThread[
myList[[##]] &, {Range[Length[myList]],
positionsAccordingToOrder }]) // Shallow
Out[14]//Shallow= {{{<<5>>}, {<<5>>}, {<<5>>}, {<<5>>}, {<<5>>}, \
{<<5>>}, {<<5>>}, {<<5>>}, {<<5>>}, {<<5>>}, <<11388>>}, {{<<5>>}, \
{<<5>>}, {<<5>>}, {<<5>>}, {<<5>>}, {<<5>>}, {<<5>>}, {<<5>>}, \
{<<5>>}, {<<5>>}, <<5774>>}, {{<<5>>}, {<<5>>}, {<<5>>}, {<<5>>}, \
{<<5>>}, {<<5>>}, {<<5>>}, {<<5>>}, {<<5>>}, {<<5>>}, <<5872>>}}
This list has the same structure as your original list, with all elements
not having counterparts with the same date in other lists removed. The
number of elements in each of the 3 sublists can be large because of allowed
repetitions of the last element (date). This list however loses the more
precise information available in the original list of positions <pos>, where
positions were grouped according to the same elements.
In any case, you get a couple of orders of magnitude speed-up. It is in fact
possible to rewrite the engine of my package so that another factor of 2-3
times speed-up is achieved, but I did not find the time to do that yet.
Hope this helps.
Regards,
Leonid
On Thu, Jul 1, 2010 at 5:28 AM, Garapata <warsaw95826 at mypacks.net> wrote:
> I have a large nested list, "myList"
>
> It has 3 sublists with the following dimensions:
>
> Dimensions/@ myList
>
> {{19808, 5}, {7952, 5}, {7952, 5}}
>
> The 5th position (i.e., column) in each of the sublists has
> SQLDateTime[]s
> (This may or may not affect what I need, but I thought everyone should
> know).
>
> myIntersection = Intersection @@ (myList[[All, All, 5]]);
>
> gives me the SQLDateTimes[]s common to all sublists. I get 3954
> common elements.
>
> Length[myIntersection]
>
> 3954
>
> All of the above works great and runs very fast.
>
> I then find the positions in myList where all the common
> SQLDateTimes[]s occur and then use Extract pull them out into a new
> list
>
> myPositions = Drop[(Position[data, #] & /@ myIntersection), None,
> None, -1];
>
> myOutput = Extract[myList, #] & /@ myPositions;
>
> I end up with just what I need, which in this case gives me 3954 rows
> of {9, 5} sublists. This occurs because myList[[1]] has 5 occurrences
> of each common date element and sublists myList[[2]] and myList[[3]]
> each have 2 occurrences of each common date element.
>
> The Extract[] runs very fast.
>
> My problem =85. the Position[] runs very very slow (over 90 seconds on a
> dual core iMac).
>
> All the code together:
>
> myIntersection = Intersection @@ (myList[[All, All, 5]]);
> myPositions = Drop[(Position[data, #] & /@ myIntersection), None,
> None, -1];
> myOutput = Extract[myList, #] & /@ myPositions;
>
> So, does anyone know a way to speed up:
>
> myPositions = Drop[(Position[data, #] & /@ myIntersection), None,
> None, -1]; ?
>
> Or can anyone suggest another approach to doing this that could run
> faster.
>
> Patterns?
> ParallelMap?
> Parallelize?
> Sorting?
> Changing SQLDateTimes to DateList[]s before calculating myPositions?
>
> Not clear what to try.
> Please advise.
>
> Thanks.
>
>
Prev by Date:
**Re: Absolute value**
Next by Date:
**Automatic numbering and table of contents**
Previous by thread:
** Re: Need to Speed up Position[]**
Next by thread:
**Re: Need to Speed up Position[]**
| |