[Date Index]
[Thread Index]
[Author Index]
Re: position of sequence of numbers in list
*To*: mathgroup at smc.vnet.net
*Subject*: [mg106994] Re: [mg106976] position of sequence of numbers in list
*From*: Leonid Shifrin <lshifr at gmail.com>
*Date*: Sun, 31 Jan 2010 05:56:00 -0500 (EST)
*References*: <201001301212.HAA25132@smc.vnet.net>
Hi again,
In my first post one of the solutions (the compiled function) contained a
bug:
In[1]:= posf[{1, 2, 3, 4, 4, 5, 6, 7}, {4, 5, 6}]
Out[1]= -1
which was because it ignores all but the first candidate sequences in the
case when they overlap. Here is a modified one which is (hopefully) correct,
if not as fast:
posf2=
Compile[{{lst,_Integer,1},{target,_Integer,1}},
Module[{i=1,len =Length[target],lstlen = Length[lst]},
While[i<=lstlen-len+1&&Take[lst,{i,len+i-1}]!=target,i++];
If[i>lstlen-len+1,-1,i]]]
In[3]:= posf2[{1, 2, 3, 4, 4, 5, 6, 7}, {4, 5, 6}]
Out[3]= 5
This one will still be much superior to Partition-based implementation for
cases when you can expect the sequence of interest to appear rather early
in the list. Anyway, sorry for the confusion with the buggy version.
By the way, should you wish to stick to Partition-based implementation, I
think it is fair to mention that for small sequence sizes and partitioning
with a shift 1, *and* when you have a list already in the packed array
representation (which is possible when your numbers are say all integers or
all reals, but not a mix), one can implement a more efficient version than
the built-in Partition:
Clear[myPartition];
myPartition[x_List, size_Integer] :=
With[{len = Length[x]},
Transpose@Table[x[[i ;; len - size + i]], {i, 1, size}]];
In[4]:= largeTestList = RandomInteger[{1, 15}, 3000000];
In[5]:= (pt = Partition[largeTestList, 2, 1]); // Timing
Out[5]= {0.521, Null}
In[6]:= (mpt = myPartition[largeTestList , 2]); // Timing
Out[6]= {0.17, Null}
In[7]:= pt == mpt
Out[7]= True
The built-in Partition will start winning when the partitioning size will be
around 30-50, so for long sequences using a built-in is better. If your list
is not packed, built-in Partition will be a few times faster even for small
partitioning sizes, so this will then be a de-optimization. You can attempt
to convert your list to packed array with Developer`ToPackedArray (note that
it does not issue any messages in case it is unable to do this), and check
that your list is packed with Developer`PackedArrayQ.
Likewise, you can implement your own Position-like function aimed at exactly
this problem, which, under similar requirements of your list being a packed
array of numbers, will be better than the built-in Position in most cases:
In[8]:=
Position[Partition[largeTest , 4, 1], {3, 4, 5, 6}, 1,
1] // Timing
Out[8]= {0.27, {{41940}}}
In[9]:= myPosition[myPartition[largeTest , 4], {3, 4, 5, 6},
1] // Timing
Out[9]= {0.09, {41940}}
In[10]:= Position[Partition[largeTest , 4, 1], {3, 4, 5, 6}, 1,
2] // Timing
Out[10]= {0.301, {{41940}, {228293}}}
In[11]:= myPosition[myPartition[largeTest , 4], {3, 4, 5, 6},
2] // Timing
Out[11]= {0.311, {41940, 228293}}
In[12]:= Position[Partition[largeTest , 4, 1], {3, 4, 5, 6}] // Timing
Out[12]= {0.55, {{41940}, {228293}, {269300}}}
In[13]:= myPosition[
myPartition[largeTest , 4], {3, 4, 5, 6}, -1] // Timing
Out[13]= {0.411, {41940, 228293, 269300}}
where the last argument -1 is a convention to return all results.
Regards,
Leonid
On Sat, Jan 30, 2010 at 4:12 AM, JB <jkerrb at gmail.com> wrote:
> Hi,
>
> What is the most efficient way to find the position of the beginning
> of a sequence of numbers from a list?
>
> I found a couple of ways:
>
> find 3,4 in list={1,2,3,4,5};
>
> 1. pos=Intersection[Position[list,3],(Position[list,4])+1]
>
> 2. pos=Position[Partition[list,2,1],{3,4}]
>
> Are there other ways to do this?
> What is the best way when dealing with large lists?
>
> Thanks,
> JB
>
>
Prev by Date:
**Re: looping**
Next by Date:
**How to combine graphics pimitives and Plot function?**
Previous by thread:
**Re: position of sequence of numbers in list**
Next by thread:
**Re: Re: position of sequence of numbers in list**
| |