Re: Pure Function for String Selection

*To*: mathgroup at smc.vnet.net*Subject*: [mg61915] Re: Pure Function for String Selection*From*: "dkr" <dkrjeg at adelphia.net>*Date*: Fri, 4 Nov 2005 05:11:40 -0500 (EST)*References*: <dht479$hl1$1@smc.vnet.net><di7rft$kq3$1@smc.vnet.net>*Sender*: owner-wri-mathgroup at wolfram.com

Edson, Here is one last crack at your filtering problem. It is much simpler than my previous filters and very competitive in terms of speed. filter11[origList:{__String}]:= StringCases[ToString[origList], RegularExpression["\\b(?![^,]*(XXXXXX|222222|11111111))[^,]+\\b"]]; We simply form a master string from your list of strings using ToString, and then use a Regular Expression to weed out the original strings with bad runs. Explanation of the regular expression: If we wanted to simply pull out the original strings from the master string, we could do this using StringCases[ToString[origList], RegularExpression[\\b"[^,]+\\b"]]; The regular expression characterizes strings that lie between word boundaries (in this example the lefthand word boundaries take the form of either { or whitespace, while the righthand word boundaries take the form of either a comma or a righthand brace ) and consist of 1 or more characters that are not commas. [^,]+ will match as large a string as possible, and hence your original strings will be generated. Then to generate only those that don't have bad runs we insert the "negative lookahead" condition (?![^,]*(XXXXXX|222222|11111111)). It essentially requires that the following text cannot begin with 0 or more characters that are not commas followed by a bad run. This suffices to rule out your bad strings. Since I am a novice as far as regular expressions go, it is likely that somewhat can suggest an alternative regular expression that will be even faster. Below I have repeated the tables from my previous message, adding a line for filter11 to each table. In[1]:= makeList[strLen_,listLen_]:= Table[StringJoin[{"1","2","X"}\[LeftDoubleBracket] Table[Random[ Integer,{1,3}],{strLen}]\[RightDoubleBracket]],{listLen}]; In[2]:= SeedRandom[1234]; egList1=makeList[14,1000]; egList2=makeList[30,1000]; egList3=makeList[30,20000]; egList4=makeList[100,20000]; 6Alt2 egList4 0.935 9 egList4 1.33 10 egList4 1.035 11 egList4 0.91 6Alt2 egList3 0.63 9 egList3 0.59 10 egList3 0.585 11 egList3 0.475 6Alt2 egList2 0.06 9 egList2 0.03 10 egList2 0.025 11 egList2 0.02 * Average of two runs. Mathematica was restarted before each run for each filter. In[2]:= SeedRandom[5678]; egList1=makeList[14,1000]; egList2=makeList[30,1000]; egList3=makeList[30,20000]; egList4=makeList[100,20000]; Filter origList Time(secs)* 6Alt2 egList4 0.935 9 egList4 1.305 10 egList4 1.04 11 egList4 0.975 6Alt2 egList3 0.645 9 egList3 0.65 10 egList3 0.58 11 egList3 0.465 6Alt2 egList2 0.07 9 egList2 0.03 10 egList2 0.03 11 egList2 0.025 * Average of two runs. Mathematica was restarted before each run for each filter. Thus, as with your earlier string reduction problem, using a master string and exploiting Mathematica's powerful string pattern capabilites may be a useful approach, especially when coupled with Maxim Rytin's excellent suggestion of using regular expressions. I don't believe there is an analogue in Mathematica's StringExpression for the type of lookahead condition that was used in filter11. dkr