Re: Pure Function for String Selection

*To*: mathgroup at smc.vnet.net*Subject*: [mg61051] Re: Pure Function for String Selection*From*: Maxim <ab_def at prontomail.com>*Date*: Sat, 8 Oct 2005 02:49:51 -0400 (EDT)*References*: <dht479$hl1$1@smc.vnet.net>*Sender*: owner-wri-mathgroup at wolfram.com

On Tue, 4 Oct 2005 05:33:29 +0000 (UTC), Edson Ferreira <edsferr at uol.com.br> wrote: > Dear members, > > I want to define a pure function to filter a set of strings. > > The strings that compose the set have all the same length and the only > characters in these strings are "1", "X" and "2". > > The function that I want is like the one bellow: > > In[1]:= > Unprotect[D]; > In[2]:= > U={"2","X"}; > In[3]:= > M={"1","2"}; > In[4]:= > D={"1","X"}; > In[5]:= > T={"1","2","X"}; > In[6]:= > L=Flatten[Outer[StringJoin,T,T,T,D]]; > In[7]:= > L = Select[L, Count[Characters[#], "1"] > 1 &]; > > In this case, it counts the number of characters "1" in each string and > select the ones that have more than one "1". > > I want a pure function, to be applied like the one in the example above, > but for a different task. > > For each string, I want it to count the maximum number of repeated > characters for each character. > > In other words, It must count the maximum number of repeated "1", "X" > and "2" for each string. > > The string must be "selected" if: > > The longest run of repeated "1" is shorter than 8 characters > AND > The longest run of repeated "X" is shorter than 6 characters > AND > The longest run of repeated "2" is shorter than 6 characters > > For example: > "11112X122X1XXX" should be "selected" > (there are four "1" in sequence, 3 "X" in sequence and 2 "2" in sequence) > > "122XXXXXX222XX" should NOT be "selected" > (there are six "X" in sequence) > > "11111111222112" should NOT be "selected" > (there are 8 "1" in sequence) > > Thanks a lot !!!!! > > Edson Ferreira > > This is very straightforward to do with RegularExpression: In[1]:= Select[{"11112X122X1XXX", "122XXXXXX222XX", "11111111222112"}, StringFreeQ[#, RegularExpression["1{8,}|X{6,}|2{6,}"]]&] Out[1]= {"11112X122X1XXX"} There is one catch though: in Mathematica {m,} quantifier is not documented (it means m or more occurences in a row). It's a very basic construct, but the Mathematica documentation for RegularExpression contains many other omissions where it's not clear whether it's safe to use certain features. In particular, the documentation doesn't mention named patterns; atomic grouping (?>); conditions; recursive patterns, even though they all seem to be available. Besides, Mathematica string patterns and regex patterns don't go together well: In[2]:= StringMatchQ["aa", RegularExpression["(.)\\1"]] Out[2]= True In[3]:= StringMatchQ["aa", x : RegularExpression["(.)\\1"]] Out[3]= False Here x is represented as a numbered subpattern too, so \\1 now refers to the whole expression. This is mentioned in the Advanced Documentation, but it's not obvious how to resolve this without named subpatterns (?P<name>): we cannot use x:RegularExpression["(.)\\2"] as it generates an error (RegularExpression::error15). Another complication is that we can't use $n to refer to numbered subpatterns on the rhs of the rule if the pattern includes Condition or PatternTest: In[4]:= StringCases["a1b2", RegularExpression["(.)\\d"]? (OddQ @@ ToCharacterCode@ #&) -> "$1"] Out[4]= {"$1"} It looks more like a bug than a deliberate design, and in any case it isn't explained in the documentation. So it seems safe to use RegularExpression only by itself, not in combination with pattern names/conditions/tests. On the other hand, if one needs to work with strings of digit characters, it may be better to use RegularExpression because of some bugs in the automatic conversion of string patterns to regexes: In[5]:= StringMatchQ["112", x_ ~~ x_ ~~ "2"] Out[5]= False We can see what went wrong by examining the internal form of the pattern: In[6]:= StringPattern`PatternConvert[x_ ~~ x_ ~~ "2"] Out[6]= {"(?ms)(.)\\12", {{Hold[x], 1}}, {}, Hold[None]} The sequence \\12 is the backreference number 12, not backreference 1 followed by "2". The pattern should have been "(.)(?:\\1)2". Maxim Rytin m.r at inbox.ru

**Follow-Ups**:**Re: Re: Pure Function for String Selection***From:*"Oyvind Tafjord" <tafjord@wolfram.com>

**Re: MathML, Mozilla, fonts and Mathematica 5.2**

**Re: Globally limiting precision or accuracy**

**Re: Pure Function for String Selection**

**Re: Re: Pure Function for String Selection**