MathGroup Archive 2005

[Date Index] [Thread Index] [Author Index]

Search the Archive

Re:Re: Pure Function for String Selection

  • To: mathgroup at smc.vnet.net
  • Subject: [mg61071] Re:[mg61051] Re: Pure Function for String Selection
  • From: "Edson Ferreira" <edsferr at uol.com.br>
  • Date: Sun, 9 Oct 2005 01:35:37 -0400 (EDT)
  • Sender: owner-wri-mathgroup at wolfram.com

Dear Members,

The new winner is Maxim Rytin with his filter5 function:

In[1]:=
selectString1Q[str_String] :=
Module[{ch=Characters[str]},ch={First[#],Length[#]}&/@Split[ch];
Max[Last /@ Select[ch, MatchQ[#, {"X", _}] &]] < 6 &&
Max[Last /@ Select[ch, MatchQ[#, {"2", _}] &]] < 6 &&
Max[Last /@ Select[ch, MatchQ[#, {"1", _}] &]] < 8
] ;

In[2]:=
selectString2Q[str_String] :=
Module[{ch},
ch =StringCases[str,y:(x_)..:>{x,StringLength[y]}];
Max[Last /@ Select[ch, MatchQ[#, {"X", _}] &]] < 6 &&
Max[Last /@ Select[ch, MatchQ[#, {"2", _}] &]] < 6 &&
Max[Last /@ Select[ch, MatchQ[#, {"1", _}] &]] < 8
] ;

In[3]:=
maxseq[s_String,z_String]:=Max[StringLength/@StringCases[s,z..]];
selectString3Q[str_String]:=
maxseq[str,"1"]<8&&maxseq[str,"X"]<6&&maxseq[str,"2"]<6;

In[5]:=
selectString4Q[str_String]:=
    StringFreeQ[str,"1"~~"1"~~"1"~~"1"~~"1"~~"1"~~"1"~~"1"] &&
      StringFreeQ[str,"X"~~"X"~~"X"~~"X"~~"X"~~"X"] &&
      StringFreeQ[str,"2"~~"2"~~"2"~~"2"~~"2"~~"2"];

In[6]:=
selectString5Q[str_String]:=
    StringFreeQ[str,RegularExpression["1{8,}|X{6,}|2{6,}"]];

In[7]:=
filter1[origList:{__String}]:=Select[origList,selectString1Q[#]&];
filter2[origList:{__String}]:=Select[origList,selectString2Q[#]&];
filter3[origList:{__String}]:=Select[origList,selectString3Q[#]&];
filter4[origList:{__String}]:=Select[origList,selectString4Q[#]&];
filter5[origList:{__String}]:=Select[origList,selectString5Q[#]&];

In[12]:=
makeList[strLen_,listLen_]:=
Table[StringJoin[{"1","2","X"}\[LeftDoubleBracket]
Table[Random[
Integer,{1,3}],{strLen}]\[RightDoubleBracket]],{listLen}];

In[13]:=
egList1=makeList[14,1000];
egList2=makeList[30,1000];
egList3=makeList[30,20000];
egList4=makeList[100,20000];

In[17]:=
{Timing[Length@filter3[egList1]],Timing[Length@filter2[egList1]],
Timing[Length@filter1[egList1]],Timing[Length@filter4[egList1]],
  Timing[Length@filter5[egList1]]}

Out[17]=
{{0.4 Second,976},{1.232 Second,976},{1.262 Second,976},{0.26 Second,
    976},{0.07 Second,976}}

In[18]:=
{Timing[Length@filter1[egList2]],Timing[Length@filter3[egList2]],
Timing[Length@filter2[egList2]],Timing[Length@filter4[egList2]],
  Timing[Length@filter5[egList2]]}

Out[18]=
{{2.243 Second,948},{0.411 Second,948},{2.143 Second,948},{0.24 Second,
    948},{0.07 Second,948}}

In[19]:=
{Timing[Length@filter2[egList3]],Timing[Length@filter1[egList3]],
Timing[Length@filter3[egList3]],Timing[Length@filter5[egList3]],
  Timing[Length@filter4[egList3]]}

Out[19]=
{{43.302 Second,19043},{42.231 Second,19043},{8.712 Second,
    19043},{1.392 Second,19043},{4.807 Second,19043}}

In[20]:=
{Timing[Length@filter5[egList4]],Timing[Length@filter2[egList4]],
  Timing[Length@filter3[egList4]],
Timing[Length@filter1[egList4]],Timing[Length@filter4[egList4]]}

Out[20]=
{{2.854 Second,16685},{121.445 Second,16685},{13.719 Second,
    16685},{119.893 Second,16685},{5.868 Second,16685}}

An even better solution!!!

Thanks and congratulations!

Edson Ferreira
Mechanical Enginner - Brazil



> On Tue, 4 Oct 2005 05:33:29 +0000 (UTC), Edson Ferreira 
> wrote: 
> 
> > Dear members, 
> > 
> > I want to define a pure function to filter a set of strings. 
> > 
> > The strings that compose the set have all the same length and the onl=
y 
> > characters in these strings are "1", "X" and "2". 
> > 
> > The function that I want is like the one bellow: 
> > 
> > In[1]:= 
> > Unprotect[D]; 
> > In[2]:= 
> > U={"2","X"}; 
> > In[3]:= 
> > M={"1","2"}; 
> > In[4]:= 
> > D={"1","X"}; 
> > In[5]:= 
> > T={"1","2","X"}; 
> > In[6]:= 
> > L=Flatten[Outer[StringJoin,T,T,T,D]]; 
> > In[7]:= 
> > L = Select[L, Count[Characters[#], "1"] > 1 &]; 
> > 
> > In this case, it counts the number of characters "1" in each string a=
nd 
> > select the ones that have more than one "1". 
> > 
> > I want a pure function, to be applied like the one in the example abo=
ve, 
> > but for a different task. 
> > 
> > For each string, I want it to count the maximum number of repeated 
> > characters for each character. 
> > 
> > In other words, It must count the maximum number of repeated "1", "X"=
 
> > and "2" for each string. 
> > 
> > The string must be "selected" if: 
> > 
> > The longest run of repeated "1" is shorter than 8 characters 
> > AND 
> > The longest run of repeated "X" is shorter than 6 characters 
> > AND 
> > The longest run of repeated "2" is shorter than 6 characters 
> > 
> > For example: 
> > "11112X122X1XXX" should be "selected" 
> > (there are four "1" in sequence, 3 "X" in sequence and 2 "2" in seque=
nce) 
> > 
> > "122XXXXXX222XX" should NOT be "selected" 
> > (there are six "X" in sequence) 
> > 
> > "11111111222112" should NOT be "selected" 
> > (there are 8 "1" in sequence) 
> > 
> > Thanks a lot !!!!! 
> > 
> > Edson Ferreira 
> > 
> > 
> 
> This is very straightforward to do with RegularExpression: 
> 
> In[1]:= Select[{"11112X122X1XXX", "122XXXXXX222XX", "11111111222112"}=
, 
> StringFreeQ[#, RegularExpression["1{8,}|X{6,}|2{6,}"]]&] 
> 
> Out[1]= {"11112X122X1XXX"} 
> 
> There is one catch though: in Mathematica {m,} quantifier is not 
> documented (it means m or more occurences in a row). It's a very basic =

> construct, but the Mathematica documentation for RegularExpression 
> contains many other omissions where it's not clear whether it's safe to=
 
> use certain features. In particular, the documentation doesn't mention =

> named patterns; atomic grouping (?>); conditions; recursive patterns, e=
ven 
> though they all seem to be available. 
> 
> Besides, Mathematica string patterns and regex patterns don't go togeth=
er 
> well: 
> 
> In[2]:= StringMatchQ["aa", RegularExpression["(.)\\1"]] 
> 
> Out[2]= True 
> 
> In[3]:= StringMatchQ["aa", x : RegularExpression["(.)\\1"]] 
> 
> Out[3]= False 
> 
> Here x is represented as a numbered subpattern too, so \\1 now refers t=
o 
> the whole expression. This is mentioned in the Advanced Documentation, =
but 
> it's not obvious how to resolve this without named subpatterns (?P): 
> we cannot use x:RegularExpression["(.)\\2"] as it generates an error 
> (RegularExpression::error15). 
> 
> Another complication is that we can't use $n to refer to numbered 
> subpatterns on the rhs of the rule if the pattern includes Condition or=
 
> PatternTest: 
> 
> In[4]:= StringCases["a1b2", RegularExpression["(.)\\d"]? 
> (OddQ @@ ToCharacterCode@ #&) -> "$1"] 
> 
> Out[4]= {"$1"} 
> 
> It looks more like a bug than a deliberate design, and in any case it 
> isn't explained in the documentation. So it seems safe to use 
> RegularExpression only by itself, not in combination with pattern 
> names/conditions/tests. 
> 
> On the other hand, if one needs to work with strings of digit character=
s, 
> it may be better to use RegularExpression because of some bugs in the 
> automatic conversion of string patterns to regexes: 
> 
> In[5]:= StringMatchQ["112", x_ ~~ x_ ~~ "2"] 
> 
> Out[5]= False 
> 
> We can see what went wrong by examining the internal form of the patter=
n: 
> 
> In[6]:= StringPattern`PatternConvert[x_ ~~ x_ ~~ "2"] 
> 
> Out[6]= {"(?ms)(.)\\12", {{Hold[x], 1}}, {}, Hold[None]} 
> 
> The sequence \\12 is the backreference number 12, not backreference 1 
> followed by "2". The pattern should have been "(.)(?:\\1)2". 
> 
> Maxim Rytin 
> m.r at inbox.ru 
> 
> 


  • Prev by Date: Sundry Questions
  • Next by Date: simplifying rational expressions
  • Previous by thread: Re:Re: Pure Function for String Selection
  • Next by thread: Re: Pure Function for String Selection