Re: Pure Function for String Selection
- To: mathgroup at smc.vnet.net
- Subject: [mg61915] Re: Pure Function for String Selection
- From: "dkr" <dkrjeg at adelphia.net>
- Date: Fri, 4 Nov 2005 05:11:40 -0500 (EST)
- References: <dht479$hl1$1@smc.vnet.net><di7rft$kq3$1@smc.vnet.net>
- Sender: owner-wri-mathgroup at wolfram.com
Edson,
Here is one last crack at your filtering problem. It is much simpler
than my previous filters and very competitive in terms of speed.
filter11[origList:{__String}]:=
StringCases[ToString[origList],
RegularExpression["\\b(?![^,]*(XXXXXX|222222|11111111))[^,]+\\b"]];
We simply form a master string from your list of strings using
ToString, and then use a Regular Expression to weed out the original
strings with bad runs.
Explanation of the regular expression:
If we wanted to simply pull out the original strings from the master
string, we could do this using
StringCases[ToString[origList], RegularExpression[\\b"[^,]+\\b"]];
The regular expression characterizes strings that lie between word
boundaries (in this example the lefthand word boundaries take the form
of either { or whitespace, while the righthand word boundaries take the
form of either a comma or a righthand brace ) and consist of 1 or more
characters that are not commas. [^,]+ will match as large a string as
possible, and hence your original strings will be generated. Then to
generate only those that don't have bad runs we insert the "negative
lookahead" condition (?![^,]*(XXXXXX|222222|11111111)). It essentially
requires that the following text cannot begin with 0 or more characters
that are not commas followed by a bad run. This suffices to rule out
your bad strings. Since I am a novice as far as regular expressions
go, it is likely that somewhat can suggest an alternative regular
expression that will be even faster.
Below I have repeated the tables from my previous message, adding a
line for filter11 to each table.
In[1]:=
makeList[strLen_,listLen_]:=
Table[StringJoin[{"1","2","X"}\[LeftDoubleBracket]
Table[Random[
Integer,{1,3}],{strLen}]\[RightDoubleBracket]],{listLen}];
In[2]:=
SeedRandom[1234];
egList1=makeList[14,1000];
egList2=makeList[30,1000];
egList3=makeList[30,20000];
egList4=makeList[100,20000];
6Alt2 egList4 0.935
9 egList4 1.33
10 egList4 1.035
11 egList4 0.91
6Alt2 egList3 0.63
9 egList3 0.59
10 egList3 0.585
11 egList3 0.475
6Alt2 egList2 0.06
9 egList2 0.03
10 egList2 0.025
11 egList2 0.02
* Average of two runs. Mathematica was restarted before each run for
each filter.
In[2]:=
SeedRandom[5678];
egList1=makeList[14,1000];
egList2=makeList[30,1000];
egList3=makeList[30,20000];
egList4=makeList[100,20000];
Filter origList Time(secs)*
6Alt2 egList4 0.935
9 egList4 1.305
10 egList4 1.04
11 egList4 0.975
6Alt2 egList3 0.645
9 egList3 0.65
10 egList3 0.58
11 egList3 0.465
6Alt2 egList2 0.07
9 egList2 0.03
10 egList2 0.03
11 egList2 0.025
* Average of two runs. Mathematica was restarted before each run for
each filter.
Thus, as with your earlier string reduction problem, using a master
string and exploiting Mathematica's powerful string pattern capabilites
may be a useful approach, especially when coupled with Maxim Rytin's
excellent suggestion of using regular expressions. I don't believe
there is an analogue in Mathematica's StringExpression for the type of
lookahead condition that was used in filter11.
dkr