Re: Pattern match question
- To: mathgroup at smc.vnet.net
- Subject: [mg68447] Re: Pattern match question
- From: ab_def at prontomail.com
- Date: Sat, 5 Aug 2006 03:46:50 -0400 (EDT)
- References: <easj50$fq0$1@smc.vnet.net><eauvga$16o$1@smc.vnet.net>
- Sender: owner-wri-mathgroup at wolfram.com
ab_def at prontomail.com wrote: > glassymeow at yahoo.com wrote: > > hi > > txt = "ZACCZBNRCSAACXBXX"; > > letters = "ABC"; > > i want to find the first occurrences of any of the > > six combinations of the letters of the set "ABC" Globally, and > > without overlap option. and the space between letters does not > > important. > > in the above txt string the result must be: > > Out[]:= > > ACCZB > > CSAACXB > > i wish a solution using mathematica regular expressions. > > the Regex pattern (A|B|C).*?(A|B|C).*?(A|B|C) will give the out: > > ACC , BNRCSA , ACXB because it considers the permutations > > and not the combinations > > the following is an old fashion program which will emulate the human > > pencil and > > paper method, will solve the problem, but i am sure there are a better > > solutions. > > > > txt = "ZACCZBNRCSAACXBXX"; > > letters = "ABC"; > > ptrnLtrs = ""; > > (* make a string of 26 zero's as the number of the alphbet*) > > For[i = 1, i <= 26, ptrnLtrs = StringJoin[ptrnLtrs, "0"]; i++] > > (* replace every letter of the pattern letters *) > > (* with a corresponding 1 in the string of the zero's *) > > For[i = 1, i <= StringLength[letters], > > num = ToCharacterCode[StringTake[letters, {i, i}]]; > > num = num - 64; > > ptrnLtrs = StringReplacePart[ptrnLtrs, "1", Flatten[{num, num}]]; > > i++]; > > > > (* the procedural pattern match *) > > ptrnLtrsBak = ptrnLtrs; y = 0; (* backup for the ptrnLtrs *) > > beginFlag = 0; result = ""; lst = {}; > > For[i = 1, i <= StringLength[txt], > > OneLetter = StringTake[txt, {i, i}]; > > If[beginFlag == 0 && StringCases[letters, OneLetter] == {}, > > Goto[jmp]]; > > num = ToCharacterCode[StringTake[txt, {i, i}]] - 64; > > If[StringTake[ptrnLtrs, num] == "1", > > result = StringJoin[result, OneLetter]; > > ptrnLtrs = StringReplacePart[ptrnLtrs, "0", Flatten[{num, > > num}]]; > > , result = StringJoin[result, OneLetter];]; > > beginFlag = 1; > > If[ToExpression[ptrnLtrs] == 0, ptrnLtrs = ptrnLtrsBak; > > Print[result]; > > result = ""; beginFlag = 0;]; > > Label[jmp]; > > i++] > > > > Out[]:= > > ACCZB > > CSAACXB > > > > regards > > peter glassy > > Here's a solution that uses string expressions: > > In[1]:= Module[ > {Lpatt = StringExpression @@@ (Insert[#, ___, {{2}, {3}}]&) /@ > Permutations@ {"A", "B", "C"}}, > StringCases["ZACCZBNRCSAACXBXX", > ShortestMatch[s__] /; StringMatchQ[s, Lpatt]]] > > Out[1]= {"ACCZB", "CSAACXB"} > > Note that in StringCases a list of patterns {patt1, patt2, ...} is > equivalent to patt1 | patt2 | ... . We cannot directly use > ShortestMatch[patt1 | patt2] because this merely makes all the > quantifiers in the regex lazy but doesn't guarantee that we get the > shortest possible match: > > In[2]:= StringPattern`PatternConvert[ > ShortestMatch[("A" ~~ ___ ~~ "B") | ("A" ~~ ___ ~~ "C")]] > > Out[2]= {"(?ms)(?:A.*?B|A.*?C)", {}, {}, Hold[None]} > > In[3]:= StringCases["ACB", > ShortestMatch[("A" ~~ ___ ~~ "B") | ("A" ~~ ___ ~~ "C")]] > > Out[3]= {"ACB"} > > The shortest match would be "AC". So it's interesting to consider how > we can obtain the same answer {"ACCZB", "CSAACXB"} using only > RegularExpression without external conditions. > > Maxim Rytin > m.r at inbox.ru Here are two shorter solutions: In[1]:= StringCases["ZACCZBNRCSAACXBXX", s : ({"A", "B", "C"} ~~ ShortestMatch[__]) /; Complement[{"A", "B", "C"}, Characters[s]] === {}] Out[1]= {"ACCZB", "CSAACXB"} In[2]:= StringCases["ZACCZBNRCSAACXBXX", RegularExpression["(A|B|C).*?(?!\\1)(A|B|C).*?(?!\\1|\\2)(A|B|C)"]] Out[2]= {"ACCZB", "CSAACXB"} Maxim Rytin m.r at inbox.ru