Re: Pattern match question
- To: mathgroup at smc.vnet.net
- Subject: [mg68418] Re: Pattern match question
- From: ab_def at prontomail.com
- Date: Fri, 4 Aug 2006 03:59:35 -0400 (EDT)
- References: <easj50$fq0$1@smc.vnet.net>
- Sender: owner-wri-mathgroup at wolfram.com
glassymeow at yahoo.com wrote:
> hi
> txt = "ZACCZBNRCSAACXBXX";
> letters = "ABC";
> i want to find the first occurrences of any of the
> six combinations of the letters of the set "ABC" Globally, and
> without overlap option. and the space between letters does not
> important.
> in the above txt string the result must be:
> Out[]:=
> ACCZB
> CSAACXB
> i wish a solution using mathematica regular expressions.
> the Regex pattern (A|B|C).*?(A|B|C).*?(A|B|C) will give the out:
> ACC , BNRCSA , ACXB because it considers the permutations
> and not the combinations
> the following is an old fashion program which will emulate the human
> pencil and
> paper method, will solve the problem, but i am sure there are a better
> solutions.
>
> txt = "ZACCZBNRCSAACXBXX";
> letters = "ABC";
> ptrnLtrs = "";
> (* make a string of 26 zero's as the number of the alphbet*)
> For[i = 1, i <= 26, ptrnLtrs = StringJoin[ptrnLtrs, "0"]; i++]
> (* replace every letter of the pattern letters *)
> (* with a corresponding 1 in the string of the zero's *)
> For[i = 1, i <= StringLength[letters],
> num = ToCharacterCode[StringTake[letters, {i, i}]];
> num = num - 64;
> ptrnLtrs = StringReplacePart[ptrnLtrs, "1", Flatten[{num, num}]];
> i++];
>
> (* the procedural pattern match *)
> ptrnLtrsBak = ptrnLtrs; y = 0; (* backup for the ptrnLtrs *)
> beginFlag = 0; result = ""; lst = {};
> For[i = 1, i <= StringLength[txt],
> OneLetter = StringTake[txt, {i, i}];
> If[beginFlag == 0 && StringCases[letters, OneLetter] == {},
> Goto[jmp]];
> num = ToCharacterCode[StringTake[txt, {i, i}]] - 64;
> If[StringTake[ptrnLtrs, num] == "1",
> result = StringJoin[result, OneLetter];
> ptrnLtrs = StringReplacePart[ptrnLtrs, "0", Flatten[{num,
> num}]];
> , result = StringJoin[result, OneLetter];];
> beginFlag = 1;
> If[ToExpression[ptrnLtrs] == 0, ptrnLtrs = ptrnLtrsBak;
> Print[result];
> result = ""; beginFlag = 0;];
> Label[jmp];
> i++]
>
> Out[]:=
> ACCZB
> CSAACXB
>
> regards
> peter glassy
Here's a solution that uses string expressions:
In[1]:= Module[
{Lpatt = StringExpression @@@ (Insert[#, ___, {{2}, {3}}]&) /@
Permutations@ {"A", "B", "C"}},
StringCases["ZACCZBNRCSAACXBXX",
ShortestMatch[s__] /; StringMatchQ[s, Lpatt]]]
Out[1]= {"ACCZB", "CSAACXB"}
Note that in StringCases a list of patterns {patt1, patt2, ...} is
equivalent to patt1 | patt2 | ... . We cannot directly use
ShortestMatch[patt1 | patt2] because this merely makes all the
quantifiers in the regex lazy but doesn't guarantee that we get the
shortest possible match:
In[2]:= StringPattern`PatternConvert[
ShortestMatch[("A" ~~ ___ ~~ "B") | ("A" ~~ ___ ~~ "C")]]
Out[2]= {"(?ms)(?:A.*?B|A.*?C)", {}, {}, Hold[None]}
In[3]:= StringCases["ACB",
ShortestMatch[("A" ~~ ___ ~~ "B") | ("A" ~~ ___ ~~ "C")]]
Out[3]= {"ACB"}
The shortest match would be "AC". So it's interesting to consider how
we can obtain the same answer {"ACCZB", "CSAACXB"} using only
RegularExpression without external conditions.
Maxim Rytin
m.r at inbox.ru
- Follow-Ups:
- Re: Re: Pattern match question
- From: "Oyvind Tafjord" <tafjord@wolfram.com>
- Re: Re: Pattern match question