Re:Re: Pure Function for String Selection
- To: mathgroup at smc.vnet.net
- Subject: [mg61071] Re:[mg61051] Re: Pure Function for String Selection
- From: "Edson Ferreira" <edsferr at uol.com.br>
- Date: Sun, 9 Oct 2005 01:35:37 -0400 (EDT)
- Sender: owner-wri-mathgroup at wolfram.com
Dear Members, The new winner is Maxim Rytin with his filter5 function: In[1]:= selectString1Q[str_String] := Module[{ch=Characters[str]},ch={First[#],Length[#]}&/@Split[ch]; Max[Last /@ Select[ch, MatchQ[#, {"X", _}] &]] < 6 && Max[Last /@ Select[ch, MatchQ[#, {"2", _}] &]] < 6 && Max[Last /@ Select[ch, MatchQ[#, {"1", _}] &]] < 8 ] ; In[2]:= selectString2Q[str_String] := Module[{ch}, ch =StringCases[str,y:(x_)..:>{x,StringLength[y]}]; Max[Last /@ Select[ch, MatchQ[#, {"X", _}] &]] < 6 && Max[Last /@ Select[ch, MatchQ[#, {"2", _}] &]] < 6 && Max[Last /@ Select[ch, MatchQ[#, {"1", _}] &]] < 8 ] ; In[3]:= maxseq[s_String,z_String]:=Max[StringLength/@StringCases[s,z..]]; selectString3Q[str_String]:= maxseq[str,"1"]<8&&maxseq[str,"X"]<6&&maxseq[str,"2"]<6; In[5]:= selectString4Q[str_String]:= StringFreeQ[str,"1"~~"1"~~"1"~~"1"~~"1"~~"1"~~"1"~~"1"] && StringFreeQ[str,"X"~~"X"~~"X"~~"X"~~"X"~~"X"] && StringFreeQ[str,"2"~~"2"~~"2"~~"2"~~"2"~~"2"]; In[6]:= selectString5Q[str_String]:= StringFreeQ[str,RegularExpression["1{8,}|X{6,}|2{6,}"]]; In[7]:= filter1[origList:{__String}]:=Select[origList,selectString1Q[#]&]; filter2[origList:{__String}]:=Select[origList,selectString2Q[#]&]; filter3[origList:{__String}]:=Select[origList,selectString3Q[#]&]; filter4[origList:{__String}]:=Select[origList,selectString4Q[#]&]; filter5[origList:{__String}]:=Select[origList,selectString5Q[#]&]; In[12]:= makeList[strLen_,listLen_]:= Table[StringJoin[{"1","2","X"}\[LeftDoubleBracket] Table[Random[ Integer,{1,3}],{strLen}]\[RightDoubleBracket]],{listLen}]; In[13]:= egList1=makeList[14,1000]; egList2=makeList[30,1000]; egList3=makeList[30,20000]; egList4=makeList[100,20000]; In[17]:= {Timing[Length@filter3[egList1]],Timing[Length@filter2[egList1]], Timing[Length@filter1[egList1]],Timing[Length@filter4[egList1]], Timing[Length@filter5[egList1]]} Out[17]= {{0.4 Second,976},{1.232 Second,976},{1.262 Second,976},{0.26 Second, 976},{0.07 Second,976}} In[18]:= {Timing[Length@filter1[egList2]],Timing[Length@filter3[egList2]], Timing[Length@filter2[egList2]],Timing[Length@filter4[egList2]], Timing[Length@filter5[egList2]]} Out[18]= {{2.243 Second,948},{0.411 Second,948},{2.143 Second,948},{0.24 Second, 948},{0.07 Second,948}} In[19]:= {Timing[Length@filter2[egList3]],Timing[Length@filter1[egList3]], Timing[Length@filter3[egList3]],Timing[Length@filter5[egList3]], Timing[Length@filter4[egList3]]} Out[19]= {{43.302 Second,19043},{42.231 Second,19043},{8.712 Second, 19043},{1.392 Second,19043},{4.807 Second,19043}} In[20]:= {Timing[Length@filter5[egList4]],Timing[Length@filter2[egList4]], Timing[Length@filter3[egList4]], Timing[Length@filter1[egList4]],Timing[Length@filter4[egList4]]} Out[20]= {{2.854 Second,16685},{121.445 Second,16685},{13.719 Second, 16685},{119.893 Second,16685},{5.868 Second,16685}} An even better solution!!! Thanks and congratulations! Edson Ferreira Mechanical Enginner - Brazil > On Tue, 4 Oct 2005 05:33:29 +0000 (UTC), Edson Ferreira > wrote: > > > Dear members, > > > > I want to define a pure function to filter a set of strings. > > > > The strings that compose the set have all the same length and the onl= y > > characters in these strings are "1", "X" and "2". > > > > The function that I want is like the one bellow: > > > > In[1]:= > > Unprotect[D]; > > In[2]:= > > U={"2","X"}; > > In[3]:= > > M={"1","2"}; > > In[4]:= > > D={"1","X"}; > > In[5]:= > > T={"1","2","X"}; > > In[6]:= > > L=Flatten[Outer[StringJoin,T,T,T,D]]; > > In[7]:= > > L = Select[L, Count[Characters[#], "1"] > 1 &]; > > > > In this case, it counts the number of characters "1" in each string a= nd > > select the ones that have more than one "1". > > > > I want a pure function, to be applied like the one in the example abo= ve, > > but for a different task. > > > > For each string, I want it to count the maximum number of repeated > > characters for each character. > > > > In other words, It must count the maximum number of repeated "1", "X"= > > and "2" for each string. > > > > The string must be "selected" if: > > > > The longest run of repeated "1" is shorter than 8 characters > > AND > > The longest run of repeated "X" is shorter than 6 characters > > AND > > The longest run of repeated "2" is shorter than 6 characters > > > > For example: > > "11112X122X1XXX" should be "selected" > > (there are four "1" in sequence, 3 "X" in sequence and 2 "2" in seque= nce) > > > > "122XXXXXX222XX" should NOT be "selected" > > (there are six "X" in sequence) > > > > "11111111222112" should NOT be "selected" > > (there are 8 "1" in sequence) > > > > Thanks a lot !!!!! > > > > Edson Ferreira > > > > > > This is very straightforward to do with RegularExpression: > > In[1]:= Select[{"11112X122X1XXX", "122XXXXXX222XX", "11111111222112"}= , > StringFreeQ[#, RegularExpression["1{8,}|X{6,}|2{6,}"]]&] > > Out[1]= {"11112X122X1XXX"} > > There is one catch though: in Mathematica {m,} quantifier is not > documented (it means m or more occurences in a row). It's a very basic = > construct, but the Mathematica documentation for RegularExpression > contains many other omissions where it's not clear whether it's safe to= > use certain features. In particular, the documentation doesn't mention = > named patterns; atomic grouping (?>); conditions; recursive patterns, e= ven > though they all seem to be available. > > Besides, Mathematica string patterns and regex patterns don't go togeth= er > well: > > In[2]:= StringMatchQ["aa", RegularExpression["(.)\\1"]] > > Out[2]= True > > In[3]:= StringMatchQ["aa", x : RegularExpression["(.)\\1"]] > > Out[3]= False > > Here x is represented as a numbered subpattern too, so \\1 now refers t= o > the whole expression. This is mentioned in the Advanced Documentation, = but > it's not obvious how to resolve this without named subpatterns (?P): > we cannot use x:RegularExpression["(.)\\2"] as it generates an error > (RegularExpression::error15). > > Another complication is that we can't use $n to refer to numbered > subpatterns on the rhs of the rule if the pattern includes Condition or= > PatternTest: > > In[4]:= StringCases["a1b2", RegularExpression["(.)\\d"]? > (OddQ @@ ToCharacterCode@ #&) -> "$1"] > > Out[4]= {"$1"} > > It looks more like a bug than a deliberate design, and in any case it > isn't explained in the documentation. So it seems safe to use > RegularExpression only by itself, not in combination with pattern > names/conditions/tests. > > On the other hand, if one needs to work with strings of digit character= s, > it may be better to use RegularExpression because of some bugs in the > automatic conversion of string patterns to regexes: > > In[5]:= StringMatchQ["112", x_ ~~ x_ ~~ "2"] > > Out[5]= False > > We can see what went wrong by examining the internal form of the patter= n: > > In[6]:= StringPattern`PatternConvert[x_ ~~ x_ ~~ "2"] > > Out[6]= {"(?ms)(.)\\12", {{Hold[x], 1}}, {}, Hold[None]} > > The sequence \\12 is the backreference number 12, not backreference 1 > followed by "2". The pattern should have been "(.)(?:\\1)2". > > Maxim Rytin > m.r at inbox.ru > >