MathGroup Archive 2005

[Date Index] [Thread Index] [Author Index]

Search the Archive

Re: Pure Function for String Selection

  • To: mathgroup at smc.vnet.net
  • Subject: [mg61915] Re: Pure Function for String Selection
  • From: "dkr" <dkrjeg at adelphia.net>
  • Date: Fri, 4 Nov 2005 05:11:40 -0500 (EST)
  • References: <dht479$hl1$1@smc.vnet.net><di7rft$kq3$1@smc.vnet.net>
  • Sender: owner-wri-mathgroup at wolfram.com

Edson,

Here is one last crack at your filtering problem. It is much simpler
than my previous filters and very competitive in terms of speed.

filter11[origList:{__String}]:=
    StringCases[ToString[origList],

RegularExpression["\\b(?![^,]*(XXXXXX|222222|11111111))[^,]+\\b"]];

We simply form a master string from your list of strings using
ToString, and then use a Regular Expression to weed out the original
strings with bad runs.
Explanation of the regular expression:
If we wanted to simply pull out the original strings from the master
string, we could do this using
StringCases[ToString[origList], RegularExpression[\\b"[^,]+\\b"]];
The regular expression characterizes strings that lie between word
boundaries (in this example the lefthand word boundaries take the form
of either { or whitespace, while the righthand word boundaries take the
form of either a comma or a righthand brace ) and consist of 1 or more
characters that are not commas.  [^,]+ will match as large a string as
possible, and hence your original strings will be generated.  Then to
generate only those that don't have bad runs we insert the "negative
lookahead" condition (?![^,]*(XXXXXX|222222|11111111)).  It essentially
requires that the following text cannot begin with 0 or more characters
that are not commas followed by a bad run.  This suffices to rule out
your bad strings.  Since I am a novice as far as regular expressions
go, it is likely that somewhat can suggest an alternative regular
expression that will be even faster.

Below I have repeated the tables from my previous message, adding a
line for filter11 to each table.

In[1]:=
 makeList[strLen_,listLen_]:=
     Table[StringJoin[{"1","2","X"}\[LeftDoubleBracket]
 Table[Random[
 Integer,{1,3}],{strLen}]\[RightDoubleBracket]],{listLen}];
 In[2]:=
 SeedRandom[1234];
 egList1=makeList[14,1000];
 egList2=makeList[30,1000];
 egList3=makeList[30,20000];
 egList4=makeList[100,20000];


 6Alt2          egList4          0.935
 9                 egList4          1.33
 10               egList4          1.035
 11		egList4		  0.91


 6Alt2          egList3          0.63
 9                 egList3          0.59
 10               egList3          0.585
 11               egList3          0.475


 6Alt2          egList2          0.06
 9                 egList2          0.03
 10               egList2          0.025
 11               egList2          0.02


* Average of two runs.  Mathematica was restarted before each run for
 each filter.


In[2]:=
 SeedRandom[5678];
 egList1=makeList[14,1000];
 egList2=makeList[30,1000];
 egList3=makeList[30,20000];
 egList4=makeList[100,20000];


Filter          origList          Time(secs)*
 6Alt2          egList4          0.935
 9                 egList4          1.305
 10               egList4          1.04
 11                egList4          0.975


6Alt2          egList3          0.645
 9                 egList3          0.65
 10               egList3          0.58
 11               egList3          0.465


6Alt2          egList2          0.07
 9                 egList2          0.03
 10               egList2          0.03
 11               egList2          0.025


* Average of two runs.  Mathematica was restarted before each run for
 each filter.

Thus, as with your earlier string reduction problem, using a master
string and exploiting Mathematica's powerful string pattern capabilites
may be a useful approach, especially when coupled with Maxim Rytin's
excellent suggestion of using regular expressions. I don't believe
there is an analogue in Mathematica's StringExpression for the type of
lookahead condition that was used in filter11.

dkr


  • Prev by Date: Re: Re: How was this typed?
  • Next by Date: Re: ExportString[exp, "MathML"]
  • Previous by thread: Re: "gray" lines in grids?
  • Next by thread: Inconsistent evaluation