Re: Pattern match question
- To: mathgroup at smc.vnet.net
- Subject: [mg68418] Re: Pattern match question
- From: ab_def at prontomail.com
- Date: Fri, 4 Aug 2006 03:59:35 -0400 (EDT)
- References: <easj50$fq0$1@smc.vnet.net>
- Sender: owner-wri-mathgroup at wolfram.com
glassymeow at yahoo.com wrote: > hi > txt = "ZACCZBNRCSAACXBXX"; > letters = "ABC"; > i want to find the first occurrences of any of the > six combinations of the letters of the set "ABC" Globally, and > without overlap option. and the space between letters does not > important. > in the above txt string the result must be: > Out[]:= > ACCZB > CSAACXB > i wish a solution using mathematica regular expressions. > the Regex pattern (A|B|C).*?(A|B|C).*?(A|B|C) will give the out: > ACC , BNRCSA , ACXB because it considers the permutations > and not the combinations > the following is an old fashion program which will emulate the human > pencil and > paper method, will solve the problem, but i am sure there are a better > solutions. > > txt = "ZACCZBNRCSAACXBXX"; > letters = "ABC"; > ptrnLtrs = ""; > (* make a string of 26 zero's as the number of the alphbet*) > For[i = 1, i <= 26, ptrnLtrs = StringJoin[ptrnLtrs, "0"]; i++] > (* replace every letter of the pattern letters *) > (* with a corresponding 1 in the string of the zero's *) > For[i = 1, i <= StringLength[letters], > num = ToCharacterCode[StringTake[letters, {i, i}]]; > num = num - 64; > ptrnLtrs = StringReplacePart[ptrnLtrs, "1", Flatten[{num, num}]]; > i++]; > > (* the procedural pattern match *) > ptrnLtrsBak = ptrnLtrs; y = 0; (* backup for the ptrnLtrs *) > beginFlag = 0; result = ""; lst = {}; > For[i = 1, i <= StringLength[txt], > OneLetter = StringTake[txt, {i, i}]; > If[beginFlag == 0 && StringCases[letters, OneLetter] == {}, > Goto[jmp]]; > num = ToCharacterCode[StringTake[txt, {i, i}]] - 64; > If[StringTake[ptrnLtrs, num] == "1", > result = StringJoin[result, OneLetter]; > ptrnLtrs = StringReplacePart[ptrnLtrs, "0", Flatten[{num, > num}]]; > , result = StringJoin[result, OneLetter];]; > beginFlag = 1; > If[ToExpression[ptrnLtrs] == 0, ptrnLtrs = ptrnLtrsBak; > Print[result]; > result = ""; beginFlag = 0;]; > Label[jmp]; > i++] > > Out[]:= > ACCZB > CSAACXB > > regards > peter glassy Here's a solution that uses string expressions: In[1]:= Module[ {Lpatt = StringExpression @@@ (Insert[#, ___, {{2}, {3}}]&) /@ Permutations@ {"A", "B", "C"}}, StringCases["ZACCZBNRCSAACXBXX", ShortestMatch[s__] /; StringMatchQ[s, Lpatt]]] Out[1]= {"ACCZB", "CSAACXB"} Note that in StringCases a list of patterns {patt1, patt2, ...} is equivalent to patt1 | patt2 | ... . We cannot directly use ShortestMatch[patt1 | patt2] because this merely makes all the quantifiers in the regex lazy but doesn't guarantee that we get the shortest possible match: In[2]:= StringPattern`PatternConvert[ ShortestMatch[("A" ~~ ___ ~~ "B") | ("A" ~~ ___ ~~ "C")]] Out[2]= {"(?ms)(?:A.*?B|A.*?C)", {}, {}, Hold[None]} In[3]:= StringCases["ACB", ShortestMatch[("A" ~~ ___ ~~ "B") | ("A" ~~ ___ ~~ "C")]] Out[3]= {"ACB"} The shortest match would be "AC". So it's interesting to consider how we can obtain the same answer {"ACCZB", "CSAACXB"} using only RegularExpression without external conditions. Maxim Rytin m.r at inbox.ru
- Follow-Ups:
- Re: Re: Pattern match question
- From: "Oyvind Tafjord" <tafjord@wolfram.com>
- Re: Re: Pattern match question