MathGroup Archive 2005

[Date Index] [Thread Index] [Author Index]

Search the Archive

Re: about PATTERNS

  • To: mathgroup at smc.vnet.net
  • Subject: [mg53688] Re: about PATTERNS
  • From: "wouter meeussen" <wouter.meeussen at pandora.be>
  • Date: Sun, 23 Jan 2005 02:02:18 -0500 (EST)
  • References: <csfffb$6gn$1@smc.vnet.net> <csios7$np0$1@smc.vnet.net>
  • Sender: owner-wri-mathgroup at wolfram.com

it maybe worthwile to note that version 4.0, not having such string pattern tools, leads to :

In[1]:=str
Out[1]="1223322176644667983323456554"
In[2]:=Characters[str]
Out[2]=
{"1", "2", "2", "3", "3", "2", "2", "1", "7", "6", "6", "4", "4", "6", "6", \
"7", "9", "8", "3", "3", "2", "3", "4", "5", "6", "5", "5", "4"}
In[3]:=
t = ReplaceList[Characters[str], {___, a_, b_, b_, c_, ___} -> {a, b, b, c}]
Out[3]=
{{"1", "2", "2", "3"}, {"2", "3", "3", "2"}, {"3", "2", "2", "1"}, {"7", "6",
    "6", "4"}, {"6", "4", "4", "6"}, {"4", "6", "6", "7"}, {"8", "3", "3",
    "2"}, {"6", "5", "5", "4"}}
In[4]:=DeleteCases[t, Alternatives @@ Complement[t, Reverse /@ t ] ]
Out[4]=
{{"1", "2", "2", "3"}, {"2", "3", "3", "2"}, {"3", "2", "2", "1"}, {"7", "6",
    "6", "4"}, {"6", "4", "4", "6"}, {"4", "6", "6", "7"}}

where the overlap
1223 (pos 1-4) and 2332 (pos 3-6) leads to different results from version 5 'StringCases'.

Wouter.


----- Original Message -----
From: "Maxim" <ab_def at prontomail.com>
To: mathgroup at smc.vnet.net
Subject: [mg53688] Re: about PATTERNS


On Mon, 17 Jan 2005 04:38:03 +0000 (UTC), George peite
<rasan1988 at hotmail.com> wrote:

> In[1]:= str = "1223322176644667983323456554";
>
> In[2]:= StringCases[str, a_ ~~ b_ ~~ b_ ~~ c_ /; a != b != c]
>
> Out[2]= {1223, 3221, 7664, 4667, 8332, 6554}
>
>   the above code will extract the pattern  abbc  from the above string,
> but
> how i could put a rule to just display the substrings wich match this
> pattern and also have an inverted form in the main string so the output
> will
> give the following:
>
> Out[2]= {1223, 3221, 7664, 4667}
>
> thanks
> George peite
>

Using StringExpression:

In[1]:=
StringCases["1223322176644667983323456554",
   a_ ~~ b_ ~~ b_ ~~ c_ ~~ ___ ~~ c_ ~~ b_ ~~ b_ ~~ a_ /;
       a =!= b =!= c ->
     Sequence[a ~~ b ~~ b ~~ c, c ~~ b ~~ b ~~ a],
   Overlaps -> True]

Out[1]=
{"1223", "3221", "7664", "4667"}

Using RegularExpression:

In[2]:=
StringCases["1223322176644667983323456554",
   RegularExpression[
       "(.)(?!\\1)(.)\\2(?!\\1)(?!\\2)(.)(?=.*\\3\\2\\2\\1)"] ->
     Sequence["$1$2$2$3", "$3$2$2$1"],
   Overlaps -> True]

Out[2]=
{"1223", "3221", "7664", "4667"}

The lookahead constructs (?=p) and (?!p) are explained in Help Browser ->
Built-in Functions -> Advanced Documentation -> Programming -> String
Patterns in Mathematica. Here is an example that clarifies why Overlaps ->
True or Overlaps -> All is required:

In[3]:=
StringCases["1223344554433221",
   a_ ~~ b_ ~~ b_ ~~ c_ ~~ ___ ~~ c_ ~~ b_ ~~ b_ ~~ a_ /;
       a =!= b =!= c ->
     Sequence[a ~~ b ~~ b ~~ c, c ~~ b ~~ b ~~ a],
   Overlaps -> True]

Out[3]=
{"1223", "3221", "2334", "4332", "3445", "5443"}

With Overlaps -> False the pattern matches the whole string (___ is
matched to "34455443") and the substrings aren't searched, so the output
would be just {"1223", "3221"}. And similarly for RegularExpression:

In[4]:=
StringCases["1223344554433221",
   RegularExpression[
       "(.)(?!\\1)(.)\\2(?!\\1)(?!\\2)(.)(?=.*\\3\\2\\2\\1)"] ->
     Sequence["$1$2$2$3", "$3$2$2$1"],
   Overlaps -> True]

Out[4]=
{"1223", "3221", "2334", "4332", "3445", "5443"}

The difference is that lookaround patterns aren't included in the match,
so with Overlaps -> False the result would be {"1223", "3221", "3445",
"5443"}.

Note also that there is some ambiguity in the problem: what should be the
result for the input string "1223221"? It contains the substring "1223"
and also its inverse "3221", but not as two separate substrings. If one
wants to include overlaps of that kind as well, it can be done as
following:

In[5]:=
StringCases[#,
   x : (a_ ~~ b_ ~~ b_ ~~ c_) /;
         a =!= b =!= c &&
         !StringFreeQ[#, StringReverse@ x],
   Overlaps -> True
]&@ "12233433221"

Out[5]=
{"1223", "2334", "4332", "3221"}

In[6]:=
StringCases["12233433221",
   RegularExpression[
     "(?=((.)(?!\\2)(.)\\3(?!\\2)(?!\\3)(.)))(?=.*\\4\\3\\3\\2)"] :>
     Sequence["$1", StringReverse@ "$1"]]

Out[6]=
{"1223", "3221", "2334", "4332"}

A subtle point is that (?=(p)) creates a numbered subpattern
(backreference) for further use, even though p is not included in the
match. Also note that Overlaps -> True is not needed in the last example.

Maxim Rytin
m.r at inbox.ru




  • Prev by Date: Re: Plotting lists of functions
  • Next by Date: Re: Monte Carlo Simulation Experiences
  • Previous by thread: Re: about PATTERNS
  • Next by thread: Equation exporting