Re: position of sequence of numbers in list
- To: mathgroup at smc.vnet.net
- Subject: [mg107132] Re: position of sequence of numbers in list
- From: "Norbert P." <bertapozar at gmail.com>
- Date: Wed, 3 Feb 2010 06:12:56 -0500 (EST)
- References: <201001301212.HAA25132@smc.vnet.net> <hk3nmu$ahf$1@smc.vnet.net>
Hi Leonid, I guess JB doesn't care about speed improvement anymore, but this is an idea that I've been using for a week (since getting Mathematica 7) that makes finding position in a packed array much faster. This works only in the case when one wants to find positions of all subsequences (see my code in In[6] and notice that my old computer is much slower than yours): In[1]:= list=RandomInteger[{1,15},3000000]; seq={3,4,5,6}; In[3]:= r1=Flatten@Position[Partition[list,4,1],{3,4,5,6}];//Timing Out[3]= {4.485,Null} In[4]:= r2=ReplaceList[list,{u___,3,4,5,6,___}:>Length[{u}]+1];// Timing Out[4]= {5.453,Null} In[5]:= r3=myPosition[myPartition[list,Length[seq]],seq,-1];//Timing Out[5]= {2.719,Null} In[6]:= fdz[v_]:=Rest@DeleteDuplicates@Prepend[v,0] r4=Fold[fdz[#1 (1-Unitize[list[[#1]]-#2])]+1&,fdz[Range[Length[list]- Length[seq]+1](1-Unitize[list[[;;-Length[seq]]]-seq[[1]]])] +1,Rest@seq]-Length[seq];//Timing Out[7]= {0.422,Null} In[8]:= r1==r2==r3==r4 Out[8]= True myPosition and myPartition are the functions from your post. I'm essentially using DeleteDuplicates together with Unitize to find positions of all occurrences of a specific number in an array. No unpacking occurs so it's quite fast. You can use this to possibly improve myPosition. Best, Norbert On Jan 31, 2:57 am, Leonid Shifrin <lsh... at gmail.com> wrote: > Hi again, > > In my first post one of the solutions (the compiled function) contained a > bug: > > In[1]:= posf[{1, 2, 3, 4, 4, 5, 6, 7}, {4, 5, 6}] > > Out[1]= -1 > > which was because it ignores all but the first candidate sequences in the > case when they overlap. Here is a modified one which is (hopefully) corre= ct, > if not as fast: > > posf2= > Compile[{{lst,_Integer,1},{target,_Integer,1}}, > Module[{i=1,len =Length[target],lstlen = Length[lst]}, > While[i<=lstlen-len+1&&Take[lst,{i,len+i-1}]!=target,i++]= ; > If[i>lstlen-len+1,-1,i]]] > > In[3]:= posf2[{1, 2, 3, 4, 4, 5, 6, 7}, {4, 5, 6}] > > Out[3]= 5 > > This one will still be much superior to Partition-based implementation fo= r > cases when you can expect the sequence of interest to appear rather= early > in the list. Anyway, sorry for the confusion with the buggy version. > > By the way, should you wish to stick to Partition-based implementation, I > think it is fair to mention that for small sequence sizes and partitionin= g > with a shift 1, *and* when you have a list already in the packed array > representation (which is possible when your numbers are say all intege= rs or > all reals, but not a mix), one can implement a more efficient version tha= n > the built-in Partition: > > Clear[myPartition]; > myPartition[x_List, size_Integer] := > With[{len = Length[x]}, > Transpose@Table[x[[i ;; len - size + i]], {i, 1, size}]]; > > In[4]:= largeTestList = RandomInteger[{1, 15}, 3000000]; > > In[5]:= (pt = Partition[largeTestList, 2, 1]); // Timing > > Out[5]= {0.521, Null} > > In[6]:= (mpt = myPartition[largeTestList , 2]); // Timing > > Out[6]= {0.17, Null} > > In[7]:= pt == mpt > > Out[7]= True > > The built-in Partition will start winning when the partitioning size will= be > around 30-50, so for long sequences using a built-in is better. If your l= ist > is not packed, built-in Partition will be a few times faster even for = small > partitioning sizes, so this will then be a de-optimization. You can attem= pt > to convert your list to packed array with Developer`ToPackedArray (note t= hat > it does not issue any messages in case it is unable to do this), and chec= k > that your list is packed with Developer`PackedArrayQ. > > Likewise, you can implement your own Position-like function aimed at exac= tly > this problem, which, under similar requirements of your list being a pack= ed > array of numbers, will be better than the built-in Position in most cases= : > > In[8]:= > Position[Partition[largeTest , 4, 1], {3, 4, 5, 6}, 1, > 1] // Timing > > Out[8]= {0.27, {{41940}}} > > In[9]:= myPosition[myPartition[largeTest , 4], {3, 4, 5, 6}, > 1] // Timing > > Out[9]= {0.09, {41940}} > > In[10]:= Position[Partition[largeTest , 4, 1], {3, 4, 5, 6}, 1, > 2] // Timing > > Out[10]= {0.301, {{41940}, {228293}}} > > In[11]:= myPosition[myPartition[largeTest , 4], {3, 4, 5, 6}, > 2] // Timing > > Out[11]= {0.311, {41940, 228293}} > > In[12]:= Position[Partition[largeTest , 4, 1], {3, 4, 5, 6}] // Timing > > Out[12]= {0.55, {{41940}, {228293}, {269300}}} > > In[13]:= myPosition[ > myPartition[largeTest , 4], {3, 4, 5, 6}, -1] // Timing > > Out[13]= {0.411, {41940, 228293, 269300}} > > where the last argument -1 is a convention to return all results. > > Regards, > Leonid > > On Sat, Jan 30, 2010 at 4:12 AM, JB <jke... at gmail.com> wrote: > > Hi, > > > What is the most efficient way to find the position of the beginning > > of a sequence of numbers from a list? > > > I found a couple of ways: > > > find 3,4 in list={1,2,3,4,5}; > > > 1. pos=Intersection[Position[list,3],(Position[list,4])+1] > > > 2. pos=Position[Partition[list,2,1],{3,4}] > > > Are there other ways to do this? > > What is the best way when dealing with large lists? > > > Thanks, > > JB > >