MathGroup Archive 2009

[Date Index] [Thread Index] [Author Index]

Search the Archive

Re: Pattern matching / subsequence matching, question

  • To: mathgroup at smc.vnet.net
  • Subject: [mg101467] Re: [mg101419] Pattern matching / subsequence matching, question
  • From: Daniel Lichtblau <danl at wolfram.com>
  • Date: Wed, 8 Jul 2009 07:09:37 -0400 (EDT)
  • References: <200907070904.FAA21966@smc.vnet.net>

heycarnut wrote:
> On pp 154-156 of the book, the example to search a sequence of digits
> for a given subsequence is shown as
> 
> FindSubsequence[lis_List, subseq_List] := Module[{p}, p =
> Partition[lis, Length[subseq], 1]; Position[p, Flatten[{___,subseq,
> ___}]]]
> 
> I'm puzzled at the use of Flatten[{___,subseq, ___}] for the pattern,
> instead of just subseq, and can't seem to see what, if anything, it
> adds to the functionality of the example. Can someone thaw my brain
> freeze?
> 
> Thanks
> Rob

Seems weird, that Flatten.

findSubsequence1[lis_List, subseq_List] := Module[{p}, p =
   Partition[lis, Length[subseq], 1];
   Position[p, Flatten[{___,subseq,___}]]]

findSubsequence2[lis_List, subseq_List] := Module[{p}, p =
   Partition[lis, Length[subseq], 1]; Position[p, subseq]]

Quick tests:

SeedRandom[11111];
n = 5*10^6;
seq = RandomInteger[{0,9},n];
subseq = {4,5,6,7,8,9};

In[71]:= Timing[posns1 = findSubsequence1[seq, subseq]]
Out[71]= {9.59954, {{158942}, {1323760}, {3227671}}}

In[72]:= Timing[posns2 = findSubsequence2[seq, subseq]]
Out[72]= {4.11837, {{158942}, {1323760}, {3227671}}}

If the sublists are at all long, you can really explode memory usage 
from the partition. An alternative that uses less memory would be to 
loop for a match on the first element of subseq, then look ahead to see 
if we have a full match and if so, record the position.

findSubsequence3[lis_List, subseq_List] := Module[
   {f=First[subseq], ls=Length[subseq], k},
   Table[
     If [lis[[j]]===f,
       For[k=2, k<=ls, k++, If [lis[[j+k-1]]=!=subseq[[k]], Break[]]];
	  If [k>ls,{j},0]
       , 0]
     ,{j,Length[lis]-ls}] /. 0:>Sequence[]
   ]

In[77]:= Timing[posns3 = findSubsequence3[seq, subseq]]
Out[77]= {10.1795, {{158942}, {1323760}, {3227671}}}

This is a bit slow; I don't expect to get a crossover with 
findSubsequence2 until we go to subsequences of maybe length 12 or so. 
Since we work with integer digit vectors, we can use Compile.

findSubsequence4C = Compile[{{lis,_Integer,1}, {subseq,_Integer,1}}, Module[
   {ls=Length[subseq], k},
   Table[
     If [lis[[j]]==subseq[[1]],
       For[k=2, k<=ls, k++, If [lis[[j+k-1]]!=subseq[[k]], Break[]]];
       If [k>ls,j,0]
       , 0]
     ,{j,Length[lis]-ls}]
   ]]

findSubsequence4[lis_List, subseq_List] :=
   Map[List,findSubsequence4C[seq, subseq] /. 0:>Sequence[]]

In[85]:= Timing[posns4 = findSubsequence4[seq, subseq]]
Out[85]= {1.65075, {{158942}, {1323760}, {3227671}}}


Daniel Lichtblau
Wolfram Research








  • Prev by Date: Re: Re: Polynomial rewriting question
  • Next by Date: compile related issue
  • Previous by thread: Re: Pattern matching / subsequence matching, question on
  • Next by thread: Re: Pattern matching / subsequence matching, question on example in