Re: Pattern matching / subsequence matching, question
- To: mathgroup at smc.vnet.net
 - Subject: [mg101467] Re: [mg101419] Pattern matching / subsequence matching, question
 - From: Daniel Lichtblau <danl at wolfram.com>
 - Date: Wed, 8 Jul 2009 07:09:37 -0400 (EDT)
 - References: <200907070904.FAA21966@smc.vnet.net>
 
heycarnut wrote:
> On pp 154-156 of the book, the example to search a sequence of digits
> for a given subsequence is shown as
> 
> FindSubsequence[lis_List, subseq_List] := Module[{p}, p =
> Partition[lis, Length[subseq], 1]; Position[p, Flatten[{___,subseq,
> ___}]]]
> 
> I'm puzzled at the use of Flatten[{___,subseq, ___}] for the pattern,
> instead of just subseq, and can't seem to see what, if anything, it
> adds to the functionality of the example. Can someone thaw my brain
> freeze?
> 
> Thanks
> Rob
Seems weird, that Flatten.
findSubsequence1[lis_List, subseq_List] := Module[{p}, p =
   Partition[lis, Length[subseq], 1];
   Position[p, Flatten[{___,subseq,___}]]]
findSubsequence2[lis_List, subseq_List] := Module[{p}, p =
   Partition[lis, Length[subseq], 1]; Position[p, subseq]]
Quick tests:
SeedRandom[11111];
n = 5*10^6;
seq = RandomInteger[{0,9},n];
subseq = {4,5,6,7,8,9};
In[71]:= Timing[posns1 = findSubsequence1[seq, subseq]]
Out[71]= {9.59954, {{158942}, {1323760}, {3227671}}}
In[72]:= Timing[posns2 = findSubsequence2[seq, subseq]]
Out[72]= {4.11837, {{158942}, {1323760}, {3227671}}}
If the sublists are at all long, you can really explode memory usage 
from the partition. An alternative that uses less memory would be to 
loop for a match on the first element of subseq, then look ahead to see 
if we have a full match and if so, record the position.
findSubsequence3[lis_List, subseq_List] := Module[
   {f=First[subseq], ls=Length[subseq], k},
   Table[
     If [lis[[j]]===f,
       For[k=2, k<=ls, k++, If [lis[[j+k-1]]=!=subseq[[k]], Break[]]];
	  If [k>ls,{j},0]
       , 0]
     ,{j,Length[lis]-ls}] /. 0:>Sequence[]
   ]
In[77]:= Timing[posns3 = findSubsequence3[seq, subseq]]
Out[77]= {10.1795, {{158942}, {1323760}, {3227671}}}
This is a bit slow; I don't expect to get a crossover with 
findSubsequence2 until we go to subsequences of maybe length 12 or so. 
Since we work with integer digit vectors, we can use Compile.
findSubsequence4C = Compile[{{lis,_Integer,1}, {subseq,_Integer,1}}, Module[
   {ls=Length[subseq], k},
   Table[
     If [lis[[j]]==subseq[[1]],
       For[k=2, k<=ls, k++, If [lis[[j+k-1]]!=subseq[[k]], Break[]]];
       If [k>ls,j,0]
       , 0]
     ,{j,Length[lis]-ls}]
   ]]
findSubsequence4[lis_List, subseq_List] :=
   Map[List,findSubsequence4C[seq, subseq] /. 0:>Sequence[]]
In[85]:= Timing[posns4 = findSubsequence4[seq, subseq]]
Out[85]= {1.65075, {{158942}, {1323760}, {3227671}}}
Daniel Lichtblau
Wolfram Research
- References:
- Pattern matching / subsequence matching, question on example in
- From: heycarnut <heycarnut@gmail.com>
 
 
 - Pattern matching / subsequence matching, question on example in