Re: about PATTERNS
- To: mathgroup at smc.vnet.net
- Subject: [mg53688] Re: about PATTERNS
- From: "wouter meeussen" <wouter.meeussen at pandora.be>
- Date: Sun, 23 Jan 2005 02:02:18 -0500 (EST)
- References: <csfffb$6gn$1@smc.vnet.net> <csios7$np0$1@smc.vnet.net>
- Sender: owner-wri-mathgroup at wolfram.com
it maybe worthwile to note that version 4.0, not having such string pattern tools, leads to : In[1]:=str Out[1]="1223322176644667983323456554" In[2]:=Characters[str] Out[2]= {"1", "2", "2", "3", "3", "2", "2", "1", "7", "6", "6", "4", "4", "6", "6", \ "7", "9", "8", "3", "3", "2", "3", "4", "5", "6", "5", "5", "4"} In[3]:= t = ReplaceList[Characters[str], {___, a_, b_, b_, c_, ___} -> {a, b, b, c}] Out[3]= {{"1", "2", "2", "3"}, {"2", "3", "3", "2"}, {"3", "2", "2", "1"}, {"7", "6", "6", "4"}, {"6", "4", "4", "6"}, {"4", "6", "6", "7"}, {"8", "3", "3", "2"}, {"6", "5", "5", "4"}} In[4]:=DeleteCases[t, Alternatives @@ Complement[t, Reverse /@ t ] ] Out[4]= {{"1", "2", "2", "3"}, {"2", "3", "3", "2"}, {"3", "2", "2", "1"}, {"7", "6", "6", "4"}, {"6", "4", "4", "6"}, {"4", "6", "6", "7"}} where the overlap 1223 (pos 1-4) and 2332 (pos 3-6) leads to different results from version 5 'StringCases'. Wouter. ----- Original Message ----- From: "Maxim" <ab_def at prontomail.com> To: mathgroup at smc.vnet.net Subject: [mg53688] Re: about PATTERNS On Mon, 17 Jan 2005 04:38:03 +0000 (UTC), George peite <rasan1988 at hotmail.com> wrote: > In[1]:= str = "1223322176644667983323456554"; > > In[2]:= StringCases[str, a_ ~~ b_ ~~ b_ ~~ c_ /; a != b != c] > > Out[2]= {1223, 3221, 7664, 4667, 8332, 6554} > > the above code will extract the pattern abbc from the above string, > but > how i could put a rule to just display the substrings wich match this > pattern and also have an inverted form in the main string so the output > will > give the following: > > Out[2]= {1223, 3221, 7664, 4667} > > thanks > George peite > Using StringExpression: In[1]:= StringCases["1223322176644667983323456554", a_ ~~ b_ ~~ b_ ~~ c_ ~~ ___ ~~ c_ ~~ b_ ~~ b_ ~~ a_ /; a =!= b =!= c -> Sequence[a ~~ b ~~ b ~~ c, c ~~ b ~~ b ~~ a], Overlaps -> True] Out[1]= {"1223", "3221", "7664", "4667"} Using RegularExpression: In[2]:= StringCases["1223322176644667983323456554", RegularExpression[ "(.)(?!\\1)(.)\\2(?!\\1)(?!\\2)(.)(?=.*\\3\\2\\2\\1)"] -> Sequence["$1$2$2$3", "$3$2$2$1"], Overlaps -> True] Out[2]= {"1223", "3221", "7664", "4667"} The lookahead constructs (?=p) and (?!p) are explained in Help Browser -> Built-in Functions -> Advanced Documentation -> Programming -> String Patterns in Mathematica. Here is an example that clarifies why Overlaps -> True or Overlaps -> All is required: In[3]:= StringCases["1223344554433221", a_ ~~ b_ ~~ b_ ~~ c_ ~~ ___ ~~ c_ ~~ b_ ~~ b_ ~~ a_ /; a =!= b =!= c -> Sequence[a ~~ b ~~ b ~~ c, c ~~ b ~~ b ~~ a], Overlaps -> True] Out[3]= {"1223", "3221", "2334", "4332", "3445", "5443"} With Overlaps -> False the pattern matches the whole string (___ is matched to "34455443") and the substrings aren't searched, so the output would be just {"1223", "3221"}. And similarly for RegularExpression: In[4]:= StringCases["1223344554433221", RegularExpression[ "(.)(?!\\1)(.)\\2(?!\\1)(?!\\2)(.)(?=.*\\3\\2\\2\\1)"] -> Sequence["$1$2$2$3", "$3$2$2$1"], Overlaps -> True] Out[4]= {"1223", "3221", "2334", "4332", "3445", "5443"} The difference is that lookaround patterns aren't included in the match, so with Overlaps -> False the result would be {"1223", "3221", "3445", "5443"}. Note also that there is some ambiguity in the problem: what should be the result for the input string "1223221"? It contains the substring "1223" and also its inverse "3221", but not as two separate substrings. If one wants to include overlaps of that kind as well, it can be done as following: In[5]:= StringCases[#, x : (a_ ~~ b_ ~~ b_ ~~ c_) /; a =!= b =!= c && !StringFreeQ[#, StringReverse@ x], Overlaps -> True ]&@ "12233433221" Out[5]= {"1223", "2334", "4332", "3221"} In[6]:= StringCases["12233433221", RegularExpression[ "(?=((.)(?!\\2)(.)\\3(?!\\2)(?!\\3)(.)))(?=.*\\4\\3\\3\\2)"] :> Sequence["$1", StringReverse@ "$1"]] Out[6]= {"1223", "3221", "2334", "4332"} A subtle point is that (?=(p)) creates a numbered subpattern (backreference) for further use, even though p is not included in the match. Also note that Overlaps -> True is not needed in the last example. Maxim Rytin m.r at inbox.ru