[Date Index]
[Thread Index]
[Author Index]
Re: about PATTERNS
*To*: mathgroup at smc.vnet.net
*Subject*: [mg53572] Re: about PATTERNS
*From*: Maxim <ab_def at prontomail.com>
*Date*: Tue, 18 Jan 2005 05:08:26 -0500 (EST)
*Organization*: MTU-Intel ISP
*References*: <csfffb$6gn$1@smc.vnet.net>
*Sender*: owner-wri-mathgroup at wolfram.com
On Mon, 17 Jan 2005 04:38:03 +0000 (UTC), George peite
<rasan1988 at hotmail.com> wrote:
> In[1]:= str = "1223322176644667983323456554";
>
> In[2]:= StringCases[str, a_ ~~ b_ ~~ b_ ~~ c_ /; a != b != c]
>
> Out[2]= {1223, 3221, 7664, 4667, 8332, 6554}
>
> the above code will extract the pattern abbc from the above string,
> but
> how i could put a rule to just display the substrings wich match this
> pattern and also have an inverted form in the main string so the output
> will
> give the following:
>
> Out[2]= {1223, 3221, 7664, 4667}
>
> thanks
> George peite
>
Using StringExpression:
In[1]:=
StringCases["1223322176644667983323456554",
a_ ~~ b_ ~~ b_ ~~ c_ ~~ ___ ~~ c_ ~~ b_ ~~ b_ ~~ a_ /;
a =!= b =!= c ->
Sequence[a ~~ b ~~ b ~~ c, c ~~ b ~~ b ~~ a],
Overlaps -> True]
Out[1]=
{"1223", "3221", "7664", "4667"}
Using RegularExpression:
In[2]:=
StringCases["1223322176644667983323456554",
RegularExpression[
"(.)(?!\\1)(.)\\2(?!\\1)(?!\\2)(.)(?=.*\\3\\2\\2\\1)"] ->
Sequence["$1$2$2$3", "$3$2$2$1"],
Overlaps -> True]
Out[2]=
{"1223", "3221", "7664", "4667"}
The lookahead constructs (?=p) and (?!p) are explained in Help Browser ->
Built-in Functions -> Advanced Documentation -> Programming -> String
Patterns in Mathematica. Here is an example that clarifies why Overlaps ->
True or Overlaps -> All is required:
In[3]:=
StringCases["1223344554433221",
a_ ~~ b_ ~~ b_ ~~ c_ ~~ ___ ~~ c_ ~~ b_ ~~ b_ ~~ a_ /;
a =!= b =!= c ->
Sequence[a ~~ b ~~ b ~~ c, c ~~ b ~~ b ~~ a],
Overlaps -> True]
Out[3]=
{"1223", "3221", "2334", "4332", "3445", "5443"}
With Overlaps -> False the pattern matches the whole string (___ is
matched to "34455443") and the substrings aren't searched, so the output
would be just {"1223", "3221"}. And similarly for RegularExpression:
In[4]:=
StringCases["1223344554433221",
RegularExpression[
"(.)(?!\\1)(.)\\2(?!\\1)(?!\\2)(.)(?=.*\\3\\2\\2\\1)"] ->
Sequence["$1$2$2$3", "$3$2$2$1"],
Overlaps -> True]
Out[4]=
{"1223", "3221", "2334", "4332", "3445", "5443"}
The difference is that lookaround patterns aren't included in the match,
so with Overlaps -> False the result would be {"1223", "3221", "3445",
"5443"}.
Note also that there is some ambiguity in the problem: what should be the
result for the input string "1223221"? It contains the substring "1223"
and also its inverse "3221", but not as two separate substrings. If one
wants to include overlaps of that kind as well, it can be done as
following:
In[5]:=
StringCases[#,
x : (a_ ~~ b_ ~~ b_ ~~ c_) /;
a =!= b =!= c &&
!StringFreeQ[#, StringReverse@ x],
Overlaps -> True
]&@ "12233433221"
Out[5]=
{"1223", "2334", "4332", "3221"}
In[6]:=
StringCases["12233433221",
RegularExpression[
"(?=((.)(?!\\2)(.)\\3(?!\\2)(?!\\3)(.)))(?=.*\\4\\3\\3\\2)"] :>
Sequence["$1", StringReverse@ "$1"]]
Out[6]=
{"1223", "3221", "2334", "4332"}
A subtle point is that (?=(p)) creates a numbered subpattern
(backreference) for further use, even though p is not included in the
match. Also note that Overlaps -> True is not needed in the last example.
Maxim Rytin
m.r at inbox.ru
Prev by Date:
**Re: cubic quaternion based surface**
Next by Date:
**Re: LegendreP -- up to which order?**
Previous by thread:
**Re: about PATTERNS**
Next by thread:
**Re: about PATTERNS**
| |