MathGroup Archive 2005

[Date Index] [Thread Index] [Author Index]

Search the Archive

Re: Pure Function for String Selection


Edson,

Here is one last crack at your filtering problem. It is much simpler
than my previous filters and very competitive in terms of speed.

filter11[origList:{__String}]:=
    StringCases[ToString[origList],

RegularExpression["\\b(?![^,]*(XXXXXX|222222|11111111))[^,]+\\b"]];

We simply form a master string from your list of strings using
ToString, and then use a Regular Expression to weed out the original
strings with bad runs.
Explanation of the regular expression:
If we wanted to simply pull out the original strings from the master
string, we could do this using
StringCases[ToString[origList], RegularExpression[\\b"[^,]+\\b"]];
The regular expression characterizes strings that lie between word
boundaries (in this example the lefthand word boundaries take the form
of either { or whitespace, while the righthand word boundaries take the
form of either a comma or a righthand brace ) and consist of 1 or more
characters that are not commas.  [^,]+ will match as large a string as
possible, and hence your original strings will be generated.  Then to
generate only those that don't have bad runs we insert the "negative
lookahead" condition (?![^,]*(XXXXXX|222222|11111111)).  It essentially
requires that the following text cannot begin with 0 or more characters
that are not commas followed by a bad run.  This suffices to rule out
your bad strings.  Since I am a novice as far as regular expressions
go, it is likely that somewhat can suggest an alternative regular
expression that will be even faster.

Below I have repeated the tables from my previous message, adding a
line for filter11 to each table.

In[1]:=
 makeList[strLen_,listLen_]:=
     Table[StringJoin[{"1","2","X"}\[LeftDoubleBracket]
 Table[Random[
 Integer,{1,3}],{strLen}]\[RightDoubleBracket]],{listLen}];
 In[2]:=
 SeedRandom[1234];
 egList1=makeList[14,1000];
 egList2=makeList[30,1000];
 egList3=makeList[30,20000];
 egList4=makeList[100,20000];


 6Alt2          egList4          0.935
 9                 egList4          1.33
 10               egList4          1.035
 11		egList4		  0.91


 6Alt2          egList3          0.63
 9                 egList3          0.59
 10               egList3          0.585
 11               egList3          0.475


 6Alt2          egList2          0.06
 9                 egList2          0.03
 10               egList2          0.025
 11               egList2          0.02


* Average of two runs.  Mathematica was restarted before each run for
 each filter.


In[2]:=
 SeedRandom[5678];
 egList1=makeList[14,1000];
 egList2=makeList[30,1000];
 egList3=makeList[30,20000];
 egList4=makeList[100,20000];


Filter          origList          Time(secs)*
 6Alt2          egList4          0.935
 9                 egList4          1.305
 10               egList4          1.04
 11                egList4          0.975


6Alt2          egList3          0.645
 9                 egList3          0.65
 10               egList3          0.58
 11               egList3          0.465


6Alt2          egList2          0.07
 9                 egList2          0.03
 10               egList2          0.03
 11               egList2          0.025


* Average of two runs.  Mathematica was restarted before each run for
 each filter.

Thus, as with your earlier string reduction problem, using a master
string and exploiting Mathematica's powerful string pattern capabilites
may be a useful approach, especially when coupled with Maxim Rytin's
excellent suggestion of using regular expressions. I don't believe
there is an analogue in Mathematica's StringExpression for the type of
lookahead condition that was used in filter11.

dkr


  • Prev by Date: Re: Re: How was this typed?
  • Next by Date: Re: ExportString[exp, "MathML"]
  • Previous by thread: Re: "gray" lines in grids?
  • Next by thread: Inconsistent evaluation