Re: Pattern matching / subsequence matching, question

• To: mathgroup at smc.vnet.net
• Subject: [mg101467] Re: [mg101419] Pattern matching / subsequence matching, question
• From: Daniel Lichtblau <danl at wolfram.com>
• Date: Wed, 8 Jul 2009 07:09:37 -0400 (EDT)
• References: <200907070904.FAA21966@smc.vnet.net>

heycarnut wrote:
> On pp 154-156 of the book, the example to search a sequence of digits
> for a given subsequence is shown as
>
> FindSubsequence[lis_List, subseq_List] := Module[{p}, p =
> Partition[lis, Length[subseq], 1]; Position[p, Flatten[{___,subseq,
> ___}]]]
>
> I'm puzzled at the use of Flatten[{___,subseq, ___}] for the pattern,
> instead of just subseq, and can't seem to see what, if anything, it
> adds to the functionality of the example. Can someone thaw my brain
> freeze?
>
> Thanks
> Rob

Seems weird, that Flatten.

findSubsequence1[lis_List, subseq_List] := Module[{p}, p =
Partition[lis, Length[subseq], 1];
Position[p, Flatten[{___,subseq,___}]]]

findSubsequence2[lis_List, subseq_List] := Module[{p}, p =
Partition[lis, Length[subseq], 1]; Position[p, subseq]]

Quick tests:

SeedRandom[11111];
n = 5*10^6;
seq = RandomInteger[{0,9},n];
subseq = {4,5,6,7,8,9};

In[71]:= Timing[posns1 = findSubsequence1[seq, subseq]]
Out[71]= {9.59954, {{158942}, {1323760}, {3227671}}}

In[72]:= Timing[posns2 = findSubsequence2[seq, subseq]]
Out[72]= {4.11837, {{158942}, {1323760}, {3227671}}}

If the sublists are at all long, you can really explode memory usage
from the partition. An alternative that uses less memory would be to
loop for a match on the first element of subseq, then look ahead to see
if we have a full match and if so, record the position.

findSubsequence3[lis_List, subseq_List] := Module[
{f=First[subseq], ls=Length[subseq], k},
Table[
If [lis[[j]]===f,
For[k=2, k<=ls, k++, If [lis[[j+k-1]]=!=subseq[[k]], Break[]]];
If [k>ls,{j},0]
, 0]
,{j,Length[lis]-ls}] /. 0:>Sequence[]
]

In[77]:= Timing[posns3 = findSubsequence3[seq, subseq]]
Out[77]= {10.1795, {{158942}, {1323760}, {3227671}}}

This is a bit slow; I don't expect to get a crossover with
findSubsequence2 until we go to subsequences of maybe length 12 or so.
Since we work with integer digit vectors, we can use Compile.

findSubsequence4C = Compile[{{lis,_Integer,1}, {subseq,_Integer,1}}, Module[
{ls=Length[subseq], k},
Table[
If [lis[[j]]==subseq[[1]],
For[k=2, k<=ls, k++, If [lis[[j+k-1]]!=subseq[[k]], Break[]]];
If [k>ls,j,0]
, 0]
,{j,Length[lis]-ls}]
]]

findSubsequence4[lis_List, subseq_List] :=
Map[List,findSubsequence4C[seq, subseq] /. 0:>Sequence[]]

In[85]:= Timing[posns4 = findSubsequence4[seq, subseq]]
Out[85]= {1.65075, {{158942}, {1323760}, {3227671}}}

Daniel Lichtblau
Wolfram Research

• Prev by Date: Re: Re: Polynomial rewriting question
• Next by Date: compile related issue
• Previous by thread: Re: Pattern matching / subsequence matching, question on
• Next by thread: Re: Pattern matching / subsequence matching, question on example in