Re: about PATTERNS

*To*: mathgroup at smc.vnet.net*Subject*: [mg53572] Re: about PATTERNS*From*: Maxim <ab_def at prontomail.com>*Date*: Tue, 18 Jan 2005 05:08:26 -0500 (EST)*Organization*: MTU-Intel ISP*References*: <csfffb$6gn$1@smc.vnet.net>*Sender*: owner-wri-mathgroup at wolfram.com

On Mon, 17 Jan 2005 04:38:03 +0000 (UTC), George peite <rasan1988 at hotmail.com> wrote: > In[1]:= str = "1223322176644667983323456554"; > > In[2]:= StringCases[str, a_ ~~ b_ ~~ b_ ~~ c_ /; a != b != c] > > Out[2]= {1223, 3221, 7664, 4667, 8332, 6554} > > the above code will extract the pattern abbc from the above string, > but > how i could put a rule to just display the substrings wich match this > pattern and also have an inverted form in the main string so the output > will > give the following: > > Out[2]= {1223, 3221, 7664, 4667} > > thanks > George peite > Using StringExpression: In[1]:= StringCases["1223322176644667983323456554", a_ ~~ b_ ~~ b_ ~~ c_ ~~ ___ ~~ c_ ~~ b_ ~~ b_ ~~ a_ /; a =!= b =!= c -> Sequence[a ~~ b ~~ b ~~ c, c ~~ b ~~ b ~~ a], Overlaps -> True] Out[1]= {"1223", "3221", "7664", "4667"} Using RegularExpression: In[2]:= StringCases["1223322176644667983323456554", RegularExpression[ "(.)(?!\\1)(.)\\2(?!\\1)(?!\\2)(.)(?=.*\\3\\2\\2\\1)"] -> Sequence["$1$2$2$3", "$3$2$2$1"], Overlaps -> True] Out[2]= {"1223", "3221", "7664", "4667"} The lookahead constructs (?=p) and (?!p) are explained in Help Browser -> Built-in Functions -> Advanced Documentation -> Programming -> String Patterns in Mathematica. Here is an example that clarifies why Overlaps -> True or Overlaps -> All is required: In[3]:= StringCases["1223344554433221", a_ ~~ b_ ~~ b_ ~~ c_ ~~ ___ ~~ c_ ~~ b_ ~~ b_ ~~ a_ /; a =!= b =!= c -> Sequence[a ~~ b ~~ b ~~ c, c ~~ b ~~ b ~~ a], Overlaps -> True] Out[3]= {"1223", "3221", "2334", "4332", "3445", "5443"} With Overlaps -> False the pattern matches the whole string (___ is matched to "34455443") and the substrings aren't searched, so the output would be just {"1223", "3221"}. And similarly for RegularExpression: In[4]:= StringCases["1223344554433221", RegularExpression[ "(.)(?!\\1)(.)\\2(?!\\1)(?!\\2)(.)(?=.*\\3\\2\\2\\1)"] -> Sequence["$1$2$2$3", "$3$2$2$1"], Overlaps -> True] Out[4]= {"1223", "3221", "2334", "4332", "3445", "5443"} The difference is that lookaround patterns aren't included in the match, so with Overlaps -> False the result would be {"1223", "3221", "3445", "5443"}. Note also that there is some ambiguity in the problem: what should be the result for the input string "1223221"? It contains the substring "1223" and also its inverse "3221", but not as two separate substrings. If one wants to include overlaps of that kind as well, it can be done as following: In[5]:= StringCases[#, x : (a_ ~~ b_ ~~ b_ ~~ c_) /; a =!= b =!= c && !StringFreeQ[#, StringReverse@ x], Overlaps -> True ]&@ "12233433221" Out[5]= {"1223", "2334", "4332", "3221"} In[6]:= StringCases["12233433221", RegularExpression[ "(?=((.)(?!\\2)(.)\\3(?!\\2)(?!\\3)(.)))(?=.*\\4\\3\\3\\2)"] :> Sequence["$1", StringReverse@ "$1"]] Out[6]= {"1223", "3221", "2334", "4332"} A subtle point is that (?=(p)) creates a numbered subpattern (backreference) for further use, even though p is not included in the match. Also note that Overlaps -> True is not needed in the last example. Maxim Rytin m.r at inbox.ru