MathGroup Archive 2006

[Date Index] [Thread Index] [Author Index]

Search the Archive

Re: Pattern match question

  • To: mathgroup at smc.vnet.net
  • Subject: [mg68418] Re: Pattern match question
  • From: ab_def at prontomail.com
  • Date: Fri, 4 Aug 2006 03:59:35 -0400 (EDT)
  • References: <easj50$fq0$1@smc.vnet.net>
  • Sender: owner-wri-mathgroup at wolfram.com

glassymeow at yahoo.com wrote:
> hi
> txt = "ZACCZBNRCSAACXBXX";
> letters = "ABC";
> i want to find the first occurrences of any of the
> six combinations of the letters of the set "ABC" Globally, and
> without overlap option. and the space between letters does not
> important.
> in the above txt string the result must be:
> Out[]:=
> ACCZB
> CSAACXB
> i wish a solution using mathematica regular expressions.
> the Regex pattern  (A|B|C).*?(A|B|C).*?(A|B|C)  will give the out:
> ACC ,  BNRCSA  ,  ACXB   because it considers the permutations
> and not the combinations
> the following is an old fashion program which will emulate the human
> pencil and
> paper method,  will solve the problem, but i am sure there are a better
> solutions.
>
> txt = "ZACCZBNRCSAACXBXX";
> letters = "ABC";
> ptrnLtrs = "";
> (* make a string of 26 zero's as the number of the alphbet*)
> For[i = 1, i <= 26, ptrnLtrs = StringJoin[ptrnLtrs, "0"]; i++]
> (* replace every letter of the pattern letters *)
> (* with a corresponding 1 in the string of the zero's *)
> For[i = 1, i <= StringLength[letters],
>     num = ToCharacterCode[StringTake[letters, {i, i}]];
>     num = num - 64;
>     ptrnLtrs = StringReplacePart[ptrnLtrs, "1", Flatten[{num, num}]];
>     i++];
>
> (* the procedural pattern match *)
> ptrnLtrsBak = ptrnLtrs; y = 0;    (* backup for the ptrnLtrs *)
> beginFlag = 0; result = ""; lst = {};
> For[i = 1, i <= StringLength[txt],
>     OneLetter = StringTake[txt, {i, i}];
>     If[beginFlag == 0 && StringCases[letters, OneLetter] == {},
>        Goto[jmp]];
>      num = ToCharacterCode[StringTake[txt, {i, i}]] - 64;
>      If[StringTake[ptrnLtrs, num] == "1",
>         result = StringJoin[result, OneLetter];
>         ptrnLtrs = StringReplacePart[ptrnLtrs, "0", Flatten[{num,
> num}]];
>         , result = StringJoin[result, OneLetter];];
>       beginFlag = 1;
>       If[ToExpression[ptrnLtrs] == 0, ptrnLtrs = ptrnLtrsBak;
>           Print[result];
>           result = "";  beginFlag = 0;];
>        Label[jmp];
>        i++]
>
> Out[]:=
> ACCZB
> CSAACXB
>
> regards
> peter glassy

Here's a solution that uses string expressions:

In[1]:= Module[
  {Lpatt = StringExpression @@@ (Insert[#, ___, {{2}, {3}}]&) /@
      Permutations@ {"A", "B", "C"}},
  StringCases["ZACCZBNRCSAACXBXX",
    ShortestMatch[s__] /; StringMatchQ[s, Lpatt]]]

Out[1]= {"ACCZB", "CSAACXB"}

Note that in StringCases a list of patterns {patt1, patt2, ...} is
equivalent to patt1 | patt2 | ... . We cannot directly use
ShortestMatch[patt1 | patt2] because this merely makes all the
quantifiers in the regex lazy but doesn't guarantee that we get the
shortest possible match:

In[2]:= StringPattern`PatternConvert[
  ShortestMatch[("A" ~~ ___ ~~ "B") | ("A" ~~ ___ ~~ "C")]]

Out[2]= {"(?ms)(?:A.*?B|A.*?C)", {}, {}, Hold[None]}

In[3]:= StringCases["ACB",
  ShortestMatch[("A" ~~ ___ ~~ "B") | ("A" ~~ ___ ~~ "C")]]

Out[3]= {"ACB"}

The shortest match would be "AC". So it's interesting to consider how
we can obtain the same answer {"ACCZB", "CSAACXB"} using only
RegularExpression without external conditions.

Maxim Rytin
m.r at inbox.ru


  • Prev by Date: Wolfram Workbench and package development
  • Next by Date: Re: Re: "No more memory available" -- a recurring problem
  • Previous by thread: Pattern match question
  • Next by thread: Re: Re: Pattern match question