MathGroup Archive 2005

[Date Index] [Thread Index] [Author Index]

Search the Archive

Re: about PATTERNS

  • To: mathgroup at smc.vnet.net
  • Subject: [mg53572] Re: about PATTERNS
  • From: Maxim <ab_def at prontomail.com>
  • Date: Tue, 18 Jan 2005 05:08:26 -0500 (EST)
  • Organization: MTU-Intel ISP
  • References: <csfffb$6gn$1@smc.vnet.net>
  • Sender: owner-wri-mathgroup at wolfram.com

On Mon, 17 Jan 2005 04:38:03 +0000 (UTC), George peite  
<rasan1988 at hotmail.com> wrote:

> In[1]:= str = "1223322176644667983323456554";
>
> In[2]:= StringCases[str, a_ ~~ b_ ~~ b_ ~~ c_ /; a != b != c]
>
> Out[2]= {1223, 3221, 7664, 4667, 8332, 6554}
>
>   the above code will extract the pattern  abbc  from the above string,  
> but
> how i could put a rule to just display the substrings wich match this
> pattern and also have an inverted form in the main string so the output  
> will
> give the following:
>
> Out[2]= {1223, 3221, 7664, 4667}
>
> thanks
> George peite
>

Using StringExpression:

In[1]:=
StringCases["1223322176644667983323456554",
   a_ ~~ b_ ~~ b_ ~~ c_ ~~ ___ ~~ c_ ~~ b_ ~~ b_ ~~ a_ /;
       a =!= b =!= c ->
     Sequence[a ~~ b ~~ b ~~ c, c ~~ b ~~ b ~~ a],
   Overlaps -> True]

Out[1]=
{"1223", "3221", "7664", "4667"}

Using RegularExpression:

In[2]:=
StringCases["1223322176644667983323456554",
   RegularExpression[
       "(.)(?!\\1)(.)\\2(?!\\1)(?!\\2)(.)(?=.*\\3\\2\\2\\1)"] ->
     Sequence["$1$2$2$3", "$3$2$2$1"],
   Overlaps -> True]

Out[2]=
{"1223", "3221", "7664", "4667"}

The lookahead constructs (?=p) and (?!p) are explained in Help Browser ->  
Built-in Functions -> Advanced Documentation -> Programming -> String  
Patterns in Mathematica. Here is an example that clarifies why Overlaps ->  
True or Overlaps -> All is required:

In[3]:=
StringCases["1223344554433221",
   a_ ~~ b_ ~~ b_ ~~ c_ ~~ ___ ~~ c_ ~~ b_ ~~ b_ ~~ a_ /;
       a =!= b =!= c ->
     Sequence[a ~~ b ~~ b ~~ c, c ~~ b ~~ b ~~ a],
   Overlaps -> True]

Out[3]=
{"1223", "3221", "2334", "4332", "3445", "5443"}

With Overlaps -> False the pattern matches the whole string (___ is  
matched to "34455443") and the substrings aren't searched, so the output  
would be just {"1223", "3221"}. And similarly for RegularExpression:

In[4]:=
StringCases["1223344554433221",
   RegularExpression[
       "(.)(?!\\1)(.)\\2(?!\\1)(?!\\2)(.)(?=.*\\3\\2\\2\\1)"] ->
     Sequence["$1$2$2$3", "$3$2$2$1"],
   Overlaps -> True]

Out[4]=
{"1223", "3221", "2334", "4332", "3445", "5443"}

The difference is that lookaround patterns aren't included in the match,  
so with Overlaps -> False the result would be {"1223", "3221", "3445",  
"5443"}.

Note also that there is some ambiguity in the problem: what should be the  
result for the input string "1223221"? It contains the substring "1223"  
and also its inverse "3221", but not as two separate substrings. If one  
wants to include overlaps of that kind as well, it can be done as  
following:

In[5]:=
StringCases[#,
   x : (a_ ~~ b_ ~~ b_ ~~ c_) /;
         a =!= b =!= c &&
         !StringFreeQ[#, StringReverse@ x],
   Overlaps -> True
]&@ "12233433221"

Out[5]=
{"1223", "2334", "4332", "3221"}

In[6]:=
StringCases["12233433221",
   RegularExpression[
     "(?=((.)(?!\\2)(.)\\3(?!\\2)(?!\\3)(.)))(?=.*\\4\\3\\3\\2)"] :>
     Sequence["$1", StringReverse@ "$1"]]

Out[6]=
{"1223", "3221", "2334", "4332"}

A subtle point is that (?=(p)) creates a numbered subpattern  
(backreference) for further use, even though p is not included in the  
match. Also note that Overlaps -> True is not needed in the last example.

Maxim Rytin
m.r at inbox.ru


  • Prev by Date: Re: cubic quaternion based surface
  • Next by Date: Re: LegendreP -- up to which order?
  • Previous by thread: Re: about PATTERNS
  • Next by thread: Re: about PATTERNS