Re: Pattern matching / subsequence matching, question
- To: mathgroup at smc.vnet.net
- Subject: [mg101467] Re: [mg101419] Pattern matching / subsequence matching, question
- From: Daniel Lichtblau <danl at wolfram.com>
- Date: Wed, 8 Jul 2009 07:09:37 -0400 (EDT)
- References: <200907070904.FAA21966@smc.vnet.net>
heycarnut wrote: > On pp 154-156 of the book, the example to search a sequence of digits > for a given subsequence is shown as > > FindSubsequence[lis_List, subseq_List] := Module[{p}, p = > Partition[lis, Length[subseq], 1]; Position[p, Flatten[{___,subseq, > ___}]]] > > I'm puzzled at the use of Flatten[{___,subseq, ___}] for the pattern, > instead of just subseq, and can't seem to see what, if anything, it > adds to the functionality of the example. Can someone thaw my brain > freeze? > > Thanks > Rob Seems weird, that Flatten. findSubsequence1[lis_List, subseq_List] := Module[{p}, p = Partition[lis, Length[subseq], 1]; Position[p, Flatten[{___,subseq,___}]]] findSubsequence2[lis_List, subseq_List] := Module[{p}, p = Partition[lis, Length[subseq], 1]; Position[p, subseq]] Quick tests: SeedRandom[11111]; n = 5*10^6; seq = RandomInteger[{0,9},n]; subseq = {4,5,6,7,8,9}; In[71]:= Timing[posns1 = findSubsequence1[seq, subseq]] Out[71]= {9.59954, {{158942}, {1323760}, {3227671}}} In[72]:= Timing[posns2 = findSubsequence2[seq, subseq]] Out[72]= {4.11837, {{158942}, {1323760}, {3227671}}} If the sublists are at all long, you can really explode memory usage from the partition. An alternative that uses less memory would be to loop for a match on the first element of subseq, then look ahead to see if we have a full match and if so, record the position. findSubsequence3[lis_List, subseq_List] := Module[ {f=First[subseq], ls=Length[subseq], k}, Table[ If [lis[[j]]===f, For[k=2, k<=ls, k++, If [lis[[j+k-1]]=!=subseq[[k]], Break[]]]; If [k>ls,{j},0] , 0] ,{j,Length[lis]-ls}] /. 0:>Sequence[] ] In[77]:= Timing[posns3 = findSubsequence3[seq, subseq]] Out[77]= {10.1795, {{158942}, {1323760}, {3227671}}} This is a bit slow; I don't expect to get a crossover with findSubsequence2 until we go to subsequences of maybe length 12 or so. Since we work with integer digit vectors, we can use Compile. findSubsequence4C = Compile[{{lis,_Integer,1}, {subseq,_Integer,1}}, Module[ {ls=Length[subseq], k}, Table[ If [lis[[j]]==subseq[[1]], For[k=2, k<=ls, k++, If [lis[[j+k-1]]!=subseq[[k]], Break[]]]; If [k>ls,j,0] , 0] ,{j,Length[lis]-ls}] ]] findSubsequence4[lis_List, subseq_List] := Map[List,findSubsequence4C[seq, subseq] /. 0:>Sequence[]] In[85]:= Timing[posns4 = findSubsequence4[seq, subseq]] Out[85]= {1.65075, {{158942}, {1323760}, {3227671}}} Daniel Lichtblau Wolfram Research
- References:
- Pattern matching / subsequence matching, question on example in
- From: heycarnut <heycarnut@gmail.com>
- Pattern matching / subsequence matching, question on example in