MathGroup Archive 2010

[Date Index] [Thread Index] [Author Index]

Search the Archive

Re: position of sequence of numbers in list

  • To: mathgroup at smc.vnet.net
  • Subject: [mg107132] Re: position of sequence of numbers in list
  • From: "Norbert P." <bertapozar at gmail.com>
  • Date: Wed, 3 Feb 2010 06:12:56 -0500 (EST)
  • References: <201001301212.HAA25132@smc.vnet.net> <hk3nmu$ahf$1@smc.vnet.net>

Hi Leonid,

I guess JB doesn't care about speed improvement anymore, but this is
an idea that I've been using for a week (since getting Mathematica 7)
that makes finding position in a packed array much faster. This works
only in the case when one wants to find positions of all subsequences
(see my code in In[6] and notice that my old computer is much slower
than yours):

In[1]:= list=RandomInteger[{1,15},3000000];
seq={3,4,5,6};
In[3]:= r1=Flatten@Position[Partition[list,4,1],{3,4,5,6}];//Timing
Out[3]= {4.485,Null}
In[4]:= r2=ReplaceList[list,{u___,3,4,5,6,___}:>Length[{u}]+1];//
Timing
Out[4]= {5.453,Null}
In[5]:= r3=myPosition[myPartition[list,Length[seq]],seq,-1];//Timing
Out[5]= {2.719,Null}

In[6]:= fdz[v_]:=Rest@DeleteDuplicates@Prepend[v,0]
r4=Fold[fdz[#1 (1-Unitize[list[[#1]]-#2])]+1&,fdz[Range[Length[list]-
Length[seq]+1](1-Unitize[list[[;;-Length[seq]]]-seq[[1]]])]
+1,Rest@seq]-Length[seq];//Timing
Out[7]= {0.422,Null}

In[8]:= r1==r2==r3==r4
Out[8]= True

myPosition and myPartition are the functions from your post.
I'm essentially using DeleteDuplicates together with Unitize to find
positions of all occurrences of a specific number in an array. No
unpacking occurs so it's quite fast. You can use this to possibly
improve myPosition.

Best,
Norbert

On Jan 31, 2:57 am, Leonid Shifrin <lsh... at gmail.com> wrote:
> Hi again,
>
> In my first post one of the solutions (the compiled function) contained a
> bug:
>
> In[1]:= posf[{1, 2, 3, 4, 4, 5, 6, 7}, {4, 5, 6}]
>
> Out[1]= -1
>
> which was because it ignores all but the first candidate sequences in the
> case when they overlap. Here is a modified one which is (hopefully) corre=
ct,
> if not as fast:
>
> posf2=
> Compile[{{lst,_Integer,1},{target,_Integer,1}},
>     Module[{i=1,len =Length[target],lstlen = Length[lst]},
>       While[i<=lstlen-len+1&&Take[lst,{i,len+i-1}]!=target,i++]=
;
>        If[i>lstlen-len+1,-1,i]]]
>
> In[3]:= posf2[{1, 2, 3, 4, 4, 5, 6, 7}, {4, 5, 6}]
>
> Out[3]= 5
>
> This one will still be much superior to Partition-based implementation fo=
r
> cases when  you can expect the sequence of interest  to appear rather=
 early
> in the list. Anyway, sorry for the confusion with the buggy version.
>
> By the way, should you wish to stick to Partition-based implementation, I
> think it is fair to mention that for small sequence sizes and partitionin=
g
> with a shift 1, *and*  when you have a list already in the packed array
> representation (which is possible when  your numbers are say all intege=
rs or
> all reals, but not a mix), one can implement a more efficient version tha=
n
> the built-in Partition:
>
> Clear[myPartition];
> myPartition[x_List, size_Integer] :=
>   With[{len = Length[x]},
>    Transpose@Table[x[[i ;; len - size + i]], {i, 1, size}]];
>
> In[4]:= largeTestList = RandomInteger[{1, 15}, 3000000];
>
> In[5]:= (pt = Partition[largeTestList, 2, 1]); // Timing
>
> Out[5]= {0.521, Null}
>
> In[6]:= (mpt = myPartition[largeTestList , 2]); // Timing
>
> Out[6]= {0.17, Null}
>
> In[7]:= pt == mpt
>
> Out[7]= True
>
> The built-in Partition will start winning when the partitioning size will=
 be
> around 30-50, so for long sequences using a built-in is better. If your l=
ist
> is not packed, built-in Partition  will be a few times faster even for =
small
> partitioning sizes, so this will then be a de-optimization. You can attem=
pt
> to convert your list to packed array with Developer`ToPackedArray (note t=
hat
> it does not issue any messages in case it is unable to do this), and chec=
k
> that your list is packed with Developer`PackedArrayQ.
>
> Likewise, you can implement your own Position-like function aimed at exac=
tly
> this problem, which, under similar requirements of your list being a pack=
ed
> array of numbers, will be better than the built-in Position in most cases=
:
>
> In[8]:=
> Position[Partition[largeTest , 4, 1], {3, 4, 5, 6}, 1,
>   1] // Timing
>
> Out[8]= {0.27, {{41940}}}
>
> In[9]:= myPosition[myPartition[largeTest , 4], {3, 4, 5, 6},
>   1] // Timing
>
> Out[9]= {0.09, {41940}}
>
> In[10]:= Position[Partition[largeTest , 4, 1], {3, 4, 5, 6}, 1,
>   2] // Timing
>
> Out[10]= {0.301, {{41940}, {228293}}}
>
> In[11]:= myPosition[myPartition[largeTest , 4], {3, 4, 5, 6},
>   2] // Timing
>
> Out[11]= {0.311, {41940, 228293}}
>
> In[12]:= Position[Partition[largeTest , 4, 1], {3, 4, 5, 6}] // Timing
>
> Out[12]= {0.55, {{41940}, {228293}, {269300}}}
>
> In[13]:= myPosition[
>   myPartition[largeTest , 4], {3, 4, 5, 6}, -1] // Timing
>
> Out[13]= {0.411, {41940, 228293, 269300}}
>
> where the last argument -1 is a convention to return all results.
>
> Regards,
> Leonid
>
> On Sat, Jan 30, 2010 at 4:12 AM, JB <jke... at gmail.com> wrote:
> > Hi,
>
> > What is the most efficient way to find the position of the beginning
> > of a sequence of numbers from a list?
>
> > I found a couple of ways:
>
> > find 3,4 in list={1,2,3,4,5};
>
> >  1.   pos=Intersection[Position[list,3],(Position[list,4])+1]
>
> >  2.   pos=Position[Partition[list,2,1],{3,4}]
>
> > Are there other ways to do this?
> > What is the best way when dealing with large lists?
>
> > Thanks,
> > JB
>
>



  • Prev by Date: Re: Can Mathematica solve this differential equation ?
  • Next by Date: Re: Re: Numerical Problem
  • Previous by thread: Re: position of sequence of numbers in list
  • Next by thread: Re: position of sequence of numbers in list