Re:Re: Pure Function for String Selection
- To: mathgroup at smc.vnet.net
- Subject: [mg61029] Re:[mg61017] Re: Pure Function for String Selection
- From: "Edson Ferreira" <edsferr at uol.com.br>
- Date: Sat, 8 Oct 2005 02:48:35 -0400 (EDT)
- Sender: owner-wri-mathgroup at wolfram.com
Dear dkr, That's what I call optimization! I've seen the comparison among the solutions and yours goes in the direction you took previously with my list set reducing problem. The decomposition via Characters takes an "extra" time of computation when we talk about large sets of lists. In fact, I've been trying hard to develop a solution for this problem using my poor knowledge in patterns in Mathematica. I will let you know my results. By the way, I'm not using Characters. I have made great peformance optimizations using your solutions. Thanks again for your time and attention!!! Edson > Edson, > > Consider the following 3 predicates: > > In[1]:== > selectString1Q[str_String] :== > Module[{ch==Characters[str]},ch=={First[#],Length[#]}&/@Split[ch]; > Max[Last /@ Select[ch, MatchQ[#, {"X", _}] &]] < 6 && > Max[Last /@ Select[ch, MatchQ[#, {"2", _}] &]] < 6 && > Max[Last /@ Select[ch, MatchQ[#, {"1", _}] &]] < 8 > ] ; > > selectString2Q[str_String] :== > Module[{ch}, > ch ==StringCases[str,y:(x_)..:>{x,StringLength[y]}]; > Max[Last /@ Select[ch, MatchQ[#, {"X", _}] &]] < 6 && > Max[Last /@ Select[ch, MatchQ[#, {"2", _}] &]] < 6 && > Max[Last /@ Select[ch, MatchQ[#, {"1", _}] &]] < 8 > ] ; > > maxseq[s_String,z_String]:==Max[StringLength/@StringCases[s,z..]]; > selectString3Q[str_String]:== > maxseq[str,"1"]<8&&maxseq[str,"X"]<6&&maxseq[str,"2"]<6; > > selectString1Q was used by Jens-Peer Kuska in his previous response in > this thread. selectString2Q appeals to a pure string pattern matching > technique to define ch rather than using Characters and Split as was > done in selectString1Q. selectString3Q is the predicate used in my > previous response in this thread. These 3 predicates can then be used > to define the 3 filter functions: > > In[5]:== > filter1[origList:{__String}]:==Select[origList,selectString1Q[#]&]; > filter2[origList:{__String}]:==Select[origList,selectString2Q[#]&]; > filter3[origList:{__String}]:==Select[origList,selectString3Q[#]&]; > > Finally define: > > In[8]:== > makeList[strLen_,listLen_]:== > Table[StringJoin[{"1","2","X"}\[LeftDoubleBracket] > Table[Random[ > > Integer,{1,3}],{strLen}]\[RightDoubleBracket]],{listLen}]; > > makeList generates a list of listLen strings, with each string having > length strLen and consisting of a random sequence of the three > characters "1", "2", and "X". > > Now let's attempt to get a very rough feel for the relative speed of > the 3 filter functions: > > In[9]:== > egList1==makeList[14,1000]; > egList2==makeList[30,1000]; > egList3==makeList[30,20000]; > egList4==makeList[100,20000]; > > In[13]:== > {Timing[Length@filter3[egList1]],Timing[Length@filter2[egList1]], > Timing[Length@filter1[egList1]]} > Out[13]== > {{0.14 Second,986},{0.42 Second,986},{0.33 Second,986}} > > In[14]:== > {Timing[Length@filter1[egList2]],Timing[Length@filter3[egList2]], > Timing[Length@filter2[egList2]]} > Out[14]== > {{0.72 Second,958},{0.09 Second,958},{0.37 Second,958}} > > In[15]:== > {Timing[Length@filter2[egList3]],Timing[Length@filter1[egList3]], > Timing[Length@filter3[egList3]]} > Out[15]== > {{7.85 Second,19057},{7.41 Second,19057},{1.68 Second,19057}} > > In[16]:== > {Timing[Length@filter2[egList4]],Timing[Length@filter3[egList4]], > Timing[Length@filter1[egList4]]} > Out[16]== > {{21. Second,16645},{3.03 Second,16645},{21.11 Second,16645}} > > (Note that the filter functions appear in a different order in each of > the four inputs.) > > In all cases filter3 is the clear winner, with the two other filter > functions being fairly close. The main difference between filter3 and > the other two is that it makes three separate passes on each string > with StringCases in order to determine the runs, while the other two > functions essentially generate a single list of runs for all the 3 > characters combined (filter1 by making a single pass with Split and > filter 2 a single pass with StringCases), and then have to go back and > identify which runs belong to which character from this generated set. > In any event, if these timing results survive more extensive testing, > it may suggest that working with your strings directly rather than > decomposing your strings via Characters may be the way to go. This > would be in accord with what you found in working with the string > reduction problem which you discussed in a previous thread. > >