Re: Pattern match question

• To: mathgroup at smc.vnet.net
• Subject: [mg68447] Re: Pattern match question
• From: ab_def at prontomail.com
• Date: Sat, 5 Aug 2006 03:46:50 -0400 (EDT)
• References: <easj50\$fq0\$1@smc.vnet.net><eauvga\$16o\$1@smc.vnet.net>
• Sender: owner-wri-mathgroup at wolfram.com

```ab_def at prontomail.com wrote:
> glassymeow at yahoo.com wrote:
> > hi
> > txt = "ZACCZBNRCSAACXBXX";
> > letters = "ABC";
> > i want to find the first occurrences of any of the
> > six combinations of the letters of the set "ABC" Globally, and
> > without overlap option. and the space between letters does not
> > important.
> > in the above txt string the result must be:
> > Out[]:=
> > ACCZB
> > CSAACXB
> > i wish a solution using mathematica regular expressions.
> > the Regex pattern  (A|B|C).*?(A|B|C).*?(A|B|C)  will give the out:
> > ACC ,  BNRCSA  ,  ACXB   because it considers the permutations
> > and not the combinations
> > the following is an old fashion program which will emulate the human
> > pencil and
> > paper method,  will solve the problem, but i am sure there are a better
> > solutions.
> >
> > txt = "ZACCZBNRCSAACXBXX";
> > letters = "ABC";
> > ptrnLtrs = "";
> > (* make a string of 26 zero's as the number of the alphbet*)
> > For[i = 1, i <= 26, ptrnLtrs = StringJoin[ptrnLtrs, "0"]; i++]
> > (* replace every letter of the pattern letters *)
> > (* with a corresponding 1 in the string of the zero's *)
> > For[i = 1, i <= StringLength[letters],
> >     num = ToCharacterCode[StringTake[letters, {i, i}]];
> >     num = num - 64;
> >     ptrnLtrs = StringReplacePart[ptrnLtrs, "1", Flatten[{num, num}]];
> >     i++];
> >
> > (* the procedural pattern match *)
> > ptrnLtrsBak = ptrnLtrs; y = 0;    (* backup for the ptrnLtrs *)
> > beginFlag = 0; result = ""; lst = {};
> > For[i = 1, i <= StringLength[txt],
> >     OneLetter = StringTake[txt, {i, i}];
> >     If[beginFlag == 0 && StringCases[letters, OneLetter] == {},
> >        Goto[jmp]];
> >      num = ToCharacterCode[StringTake[txt, {i, i}]] - 64;
> >      If[StringTake[ptrnLtrs, num] == "1",
> >         result = StringJoin[result, OneLetter];
> >         ptrnLtrs = StringReplacePart[ptrnLtrs, "0", Flatten[{num,
> > num}]];
> >         , result = StringJoin[result, OneLetter];];
> >       beginFlag = 1;
> >       If[ToExpression[ptrnLtrs] == 0, ptrnLtrs = ptrnLtrsBak;
> >           Print[result];
> >           result = "";  beginFlag = 0;];
> >        Label[jmp];
> >        i++]
> >
> > Out[]:=
> > ACCZB
> > CSAACXB
> >
> > regards
> > peter glassy
>
> Here's a solution that uses string expressions:
>
> In[1]:= Module[
>   {Lpatt = StringExpression @@@ (Insert[#, ___, {{2}, {3}}]&) /@
>       Permutations@ {"A", "B", "C"}},
>   StringCases["ZACCZBNRCSAACXBXX",
>     ShortestMatch[s__] /; StringMatchQ[s, Lpatt]]]
>
> Out[1]= {"ACCZB", "CSAACXB"}
>
> Note that in StringCases a list of patterns {patt1, patt2, ...} is
> equivalent to patt1 | patt2 | ... . We cannot directly use
> ShortestMatch[patt1 | patt2] because this merely makes all the
> quantifiers in the regex lazy but doesn't guarantee that we get the
> shortest possible match:
>
> In[2]:= StringPattern`PatternConvert[
>   ShortestMatch[("A" ~~ ___ ~~ "B") | ("A" ~~ ___ ~~ "C")]]
>
> Out[2]= {"(?ms)(?:A.*?B|A.*?C)", {}, {}, Hold[None]}
>
> In[3]:= StringCases["ACB",
>   ShortestMatch[("A" ~~ ___ ~~ "B") | ("A" ~~ ___ ~~ "C")]]
>
> Out[3]= {"ACB"}
>
> The shortest match would be "AC". So it's interesting to consider how
> we can obtain the same answer {"ACCZB", "CSAACXB"} using only
> RegularExpression without external conditions.
>
> Maxim Rytin
> m.r at inbox.ru

Here are two shorter solutions:

In[1]:= StringCases["ZACCZBNRCSAACXBXX",
s : ({"A", "B", "C"} ~~ ShortestMatch[__]) /;
Complement[{"A", "B", "C"}, Characters[s]] === {}]

Out[1]= {"ACCZB", "CSAACXB"}

In[2]:= StringCases["ZACCZBNRCSAACXBXX",
RegularExpression["(A|B|C).*?(?!\\1)(A|B|C).*?(?!\\1|\\2)(A|B|C)"]]

Out[2]= {"ACCZB", "CSAACXB"}

Maxim Rytin
m.r at inbox.ru

```

• Prev by Date: Re: Wolfram Workbench and package development
• Next by Date: Re: "No more memory available" -- a recurring problem
• Previous by thread: Re: Re: Pattern match question
• Next by thread: Re: Precision of arguments to FunctionInterpolation