Re: Pure Function for String Selection

*To*: mathgroup at smc.vnet.net*Subject*: [mg61017] Re: Pure Function for String Selection*From*: "dkr" <dkrjeg at adelphia.net>*Date*: Fri, 7 Oct 2005 03:38:08 -0400 (EDT)*References*: <dht479$hl1$1@smc.vnet.net>*Sender*: owner-wri-mathgroup at wolfram.com

Edson, Consider the following 3 predicates: In[1]:= selectString1Q[str_String] := Module[{ch=Characters[str]},ch={First[#],Length[#]}&/@Split[ch]; Max[Last /@ Select[ch, MatchQ[#, {"X", _}] &]] < 6 && Max[Last /@ Select[ch, MatchQ[#, {"2", _}] &]] < 6 && Max[Last /@ Select[ch, MatchQ[#, {"1", _}] &]] < 8 ] ; selectString2Q[str_String] := Module[{ch}, ch =StringCases[str,y:(x_)..:>{x,StringLength[y]}]; Max[Last /@ Select[ch, MatchQ[#, {"X", _}] &]] < 6 && Max[Last /@ Select[ch, MatchQ[#, {"2", _}] &]] < 6 && Max[Last /@ Select[ch, MatchQ[#, {"1", _}] &]] < 8 ] ; maxseq[s_String,z_String]:=Max[StringLength/@StringCases[s,z..]]; selectString3Q[str_String]:= maxseq[str,"1"]<8&&maxseq[str,"X"]<6&&maxseq[str,"2"]<6; selectString1Q was used by Jens-Peer Kuska in his previous response in this thread. selectString2Q appeals to a pure string pattern matching technique to define ch rather than using Characters and Split as was done in selectString1Q. selectString3Q is the predicate used in my previous response in this thread. These 3 predicates can then be used to define the 3 filter functions: In[5]:= filter1[origList:{__String}]:=Select[origList,selectString1Q[#]&]; filter2[origList:{__String}]:=Select[origList,selectString2Q[#]&]; filter3[origList:{__String}]:=Select[origList,selectString3Q[#]&]; Finally define: In[8]:= makeList[strLen_,listLen_]:= Table[StringJoin[{"1","2","X"}\[LeftDoubleBracket] Table[Random[ Integer,{1,3}],{strLen}]\[RightDoubleBracket]],{listLen}]; makeList generates a list of listLen strings, with each string having length strLen and consisting of a random sequence of the three characters "1", "2", and "X". Now let's attempt to get a very rough feel for the relative speed of the 3 filter functions: In[9]:= egList1=makeList[14,1000]; egList2=makeList[30,1000]; egList3=makeList[30,20000]; egList4=makeList[100,20000]; In[13]:= {Timing[Length@filter3[egList1]],Timing[Length@filter2[egList1]], Timing[Length@filter1[egList1]]} Out[13]= {{0.14 Second,986},{0.42 Second,986},{0.33 Second,986}} In[14]:= {Timing[Length@filter1[egList2]],Timing[Length@filter3[egList2]], Timing[Length@filter2[egList2]]} Out[14]= {{0.72 Second,958},{0.09 Second,958},{0.37 Second,958}} In[15]:= {Timing[Length@filter2[egList3]],Timing[Length@filter1[egList3]], Timing[Length@filter3[egList3]]} Out[15]= {{7.85 Second,19057},{7.41 Second,19057},{1.68 Second,19057}} In[16]:= {Timing[Length@filter2[egList4]],Timing[Length@filter3[egList4]], Timing[Length@filter1[egList4]]} Out[16]= {{21. Second,16645},{3.03 Second,16645},{21.11 Second,16645}} (Note that the filter functions appear in a different order in each of the four inputs.) In all cases filter3 is the clear winner, with the two other filter functions being fairly close. The main difference between filter3 and the other two is that it makes three separate passes on each string with StringCases in order to determine the runs, while the other two functions essentially generate a single list of runs for all the 3 characters combined (filter1 by making a single pass with Split and filter 2 a single pass with StringCases), and then have to go back and identify which runs belong to which character from this generated set. In any event, if these timing results survive more extensive testing, it may suggest that working with your strings directly rather than decomposing your strings via Characters may be the way to go. This would be in accord with what you found in working with the string reduction problem which you discussed in a previous thread.

**Follow-Ups**:**Re: Re: Pure Function for String Selection***From:*"Edson Ferreira" <edsferr@uol.com.br>