       Re: position of sequence of numbers in list

• To: mathgroup at smc.vnet.net
• Subject: [mg107132] Re: position of sequence of numbers in list
• From: "Norbert P." <bertapozar at gmail.com>
• Date: Wed, 3 Feb 2010 06:12:56 -0500 (EST)
• References: <201001301212.HAA25132@smc.vnet.net> <hk3nmu\$ahf\$1@smc.vnet.net>

```Hi Leonid,

I guess JB doesn't care about speed improvement anymore, but this is
an idea that I've been using for a week (since getting Mathematica 7)
that makes finding position in a packed array much faster. This works
only in the case when one wants to find positions of all subsequences
(see my code in In and notice that my old computer is much slower
than yours):

In:= list=RandomInteger[{1,15},3000000];
seq={3,4,5,6};
In:= r1=Flatten@Position[Partition[list,4,1],{3,4,5,6}];//Timing
Out= {4.485,Null}
In:= r2=ReplaceList[list,{u___,3,4,5,6,___}:>Length[{u}]+1];//
Timing
Out= {5.453,Null}
In:= r3=myPosition[myPartition[list,Length[seq]],seq,-1];//Timing
Out= {2.719,Null}

In:= fdz[v_]:=Rest@DeleteDuplicates@Prepend[v,0]
r4=Fold[fdz[#1 (1-Unitize[list[[#1]]-#2])]+1&,fdz[Range[Length[list]-
Length[seq]+1](1-Unitize[list[[;;-Length[seq]]]-seq[]])]
+1,Rest@seq]-Length[seq];//Timing
Out= {0.422,Null}

In:= r1==r2==r3==r4
Out= True

myPosition and myPartition are the functions from your post.
I'm essentially using DeleteDuplicates together with Unitize to find
positions of all occurrences of a specific number in an array. No
unpacking occurs so it's quite fast. You can use this to possibly
improve myPosition.

Best,
Norbert

On Jan 31, 2:57 am, Leonid Shifrin <lsh... at gmail.com> wrote:
> Hi again,
>
> In my first post one of the solutions (the compiled function) contained a
> bug:
>
> In:= posf[{1, 2, 3, 4, 4, 5, 6, 7}, {4, 5, 6}]
>
> Out= -1
>
> which was because it ignores all but the first candidate sequences in the
> case when they overlap. Here is a modified one which is (hopefully) corre=
ct,
> if not as fast:
>
> posf2=
> Compile[{{lst,_Integer,1},{target,_Integer,1}},
>     Module[{i=1,len =Length[target],lstlen = Length[lst]},
>       While[i<=lstlen-len+1&&Take[lst,{i,len+i-1}]!=target,i++]=
;
>        If[i>lstlen-len+1,-1,i]]]
>
> In:= posf2[{1, 2, 3, 4, 4, 5, 6, 7}, {4, 5, 6}]
>
> Out= 5
>
> This one will still be much superior to Partition-based implementation fo=
r
> cases when  you can expect the sequence of interest  to appear rather=
early
> in the list. Anyway, sorry for the confusion with the buggy version.
>
> By the way, should you wish to stick to Partition-based implementation, I
> think it is fair to mention that for small sequence sizes and partitionin=
g
> with a shift 1, *and*  when you have a list already in the packed array
> representation (which is possible when  your numbers are say all intege=
rs or
> all reals, but not a mix), one can implement a more efficient version tha=
n
> the built-in Partition:
>
> Clear[myPartition];
> myPartition[x_List, size_Integer] :=
>   With[{len = Length[x]},
>    Transpose@Table[x[[i ;; len - size + i]], {i, 1, size}]];
>
> In:= largeTestList = RandomInteger[{1, 15}, 3000000];
>
> In:= (pt = Partition[largeTestList, 2, 1]); // Timing
>
> Out= {0.521, Null}
>
> In:= (mpt = myPartition[largeTestList , 2]); // Timing
>
> Out= {0.17, Null}
>
> In:= pt == mpt
>
> Out= True
>
> The built-in Partition will start winning when the partitioning size will=
be
> around 30-50, so for long sequences using a built-in is better. If your l=
ist
> is not packed, built-in Partition  will be a few times faster even for =
small
> partitioning sizes, so this will then be a de-optimization. You can attem=
pt
> to convert your list to packed array with Developer`ToPackedArray (note t=
hat
> it does not issue any messages in case it is unable to do this), and chec=
k
> that your list is packed with Developer`PackedArrayQ.
>
> Likewise, you can implement your own Position-like function aimed at exac=
tly
> this problem, which, under similar requirements of your list being a pack=
ed
> array of numbers, will be better than the built-in Position in most cases=
:
>
> In:=
> Position[Partition[largeTest , 4, 1], {3, 4, 5, 6}, 1,
>   1] // Timing
>
> Out= {0.27, {{41940}}}
>
> In:= myPosition[myPartition[largeTest , 4], {3, 4, 5, 6},
>   1] // Timing
>
> Out= {0.09, {41940}}
>
> In:= Position[Partition[largeTest , 4, 1], {3, 4, 5, 6}, 1,
>   2] // Timing
>
> Out= {0.301, {{41940}, {228293}}}
>
> In:= myPosition[myPartition[largeTest , 4], {3, 4, 5, 6},
>   2] // Timing
>
> Out= {0.311, {41940, 228293}}
>
> In:= Position[Partition[largeTest , 4, 1], {3, 4, 5, 6}] // Timing
>
> Out= {0.55, {{41940}, {228293}, {269300}}}
>
> In:= myPosition[
>   myPartition[largeTest , 4], {3, 4, 5, 6}, -1] // Timing
>
> Out= {0.411, {41940, 228293, 269300}}
>
> where the last argument -1 is a convention to return all results.
>
> Regards,
> Leonid
>
> On Sat, Jan 30, 2010 at 4:12 AM, JB <jke... at gmail.com> wrote:
> > Hi,
>
> > What is the most efficient way to find the position of the beginning
> > of a sequence of numbers from a list?
>
> > I found a couple of ways:
>
> > find 3,4 in list={1,2,3,4,5};
>
> >  1.   pos=Intersection[Position[list,3],(Position[list,4])+1]
>
> >  2.   pos=Position[Partition[list,2,1],{3,4}]
>
> > Are there other ways to do this?
> > What is the best way when dealing with large lists?
>
> > Thanks,
> > JB
>
>

```

• Prev by Date: Re: Can Mathematica solve this differential equation ?
• Next by Date: Re: Re: Numerical Problem
• Previous by thread: Re: position of sequence of numbers in list
• Next by thread: Re: position of sequence of numbers in list