Re: Pure Function for String Selection
- To: mathgroup at smc.vnet.net
- Subject: [mg61594] Re: Pure Function for String Selection
- From: "dkr" <dkrjeg at adelphia.net>
- Date: Sun, 23 Oct 2005 05:45:57 -0400 (EDT)
- References: <dht479$hl1$1@smc.vnet.net><di7ra7$kov$1@smc.vnet.net> <diabsp$ijr$1@smc.vnet.net>
- Sender: owner-wri-mathgroup at wolfram.com
Edson, Below we compare three approaches to your string filtering problem: filter10 (Maxim Rytin's approach) filter6Alt2 (A slightly amended version of my filter6 approach, discussed previously) filter9 (A new, single string approach) _________________ selectString10Q[str_String]:= StringFreeQ[str,RegularExpression["1{8}|X{6}|2{6}"]]; filter10[origList:{__String}]:=Select[origList,selectString10Q]; _________________ selectString6Alt2Q[str_String]:= StringCases[str,{"XXXXXX","222222","11111111"},1]==={}; filter6Alt2[origList:{__String}]:=Select[origList,selectString6Alt2Q]; _________________ filter9[origList:{__String}]:= StringCases[ StringReplace[ToString[origList], RegularExpression["1{8}|X{6}|2{6}"]:>""], RegularExpression[ StringJoin["\\w{",#,",",#,"}"]&[ ToString[StringLength[First[origList]]]]]]; Here we form a single string from the original list of strings (though unlike our previous filter7 case, we do not explicitly insert list braces as delimiters), then replace all bad runs in this string with "", and then, via StringCases, pick out all remaining runs of word characters whose length is equal to the common length of the original strings. The runs corresponding to the bad original strings will not be picked out because their length has been reduced by the replacement operation. _________________ In[1]:= makeList[strLen_,listLen_]:= Table[StringJoin[{"1","2","X"}\[LeftDoubleBracket] Table[Random[ Integer,{1,3}],{strLen}]\[RightDoubleBracket]],{listLen}]; In[2]:= SeedRandom[1234]; egList1=makeList[14,1000]; egList2=makeList[30,1000]; egList3=makeList[30,20000]; egList4=makeList[100,20000]; Filter origList Time(secs)* 6Alt2 egList4 0.935 9 egList4 1.33 10 egList4 1.035 6Alt2 egList3 0.63 9 egList3 0.59 10 egList3 0.585 6Alt2 egList2 0.06 9 egList2 0.03 10 egList2 0.025 * Average of two runs. Mathematica was restarted before each run for each filter. In[2]:= SeedRandom[5678]; egList1=makeList[14,1000]; egList2=makeList[30,1000]; egList3=makeList[30,20000]; egList4=makeList[100,20000]; Filter origList Time(secs)* 6Alt2 egList4 0.935 9 egList4 1.305 10 egList4 1.04 6Alt2 egList3 0.645 9 egList3 0.65 10 egList3 0.58 6Alt2 egList2 0.07 9 egList2 0.03 10 egList2 0.03 * Average of two runs. Mathematica was restarted before each run for each filter. Based upon these meager test results, there doesn't appear to be a whole lot of difference between the filters, except that the single string method may lag behind a bit for problems the size of egList4. One thing I have noted in my testing is that it is faster to use patterns like RegularExpression["1{8}|X{6}|2{6}"] or "XXXXXX"|"222222"|"11111111" than it is to use X~~X~~X~~X~~X~~X|2~~2~~2~~2~~2~~2|1~~1~~1~~1~~1~~1~~1~~1.