MathGroup Archive 2005

[Date Index] [Thread Index] [Author Index]

Search the Archive

Re: Pure Function for String Selection

  • To: mathgroup at smc.vnet.net
  • Subject: [mg61085] Re: Pure Function for String Selection
  • From: "dkr" <dkrjeg at adelphia.net>
  • Date: Sun, 9 Oct 2005 01:36:38 -0400 (EDT)
  • References: <dht479$hl1$1@smc.vnet.net><200510070738.DAA03287@smc.vnet.net> <di7ra7$kov$1@smc.vnet.net>
  • Sender: owner-wri-mathgroup at wolfram.com

Edson,

Congratulations.  Your approach is very elegant.  I like it very much.

The main attribute of your approach is that it will stop examining a
string as soon as it has found the first "bad" run, while filter1,
filter2, and filter3 all share the undesirable trait of identifying ALL
bad runs in a string.  I think this attribute is a major source of the
relative efficiency of your approach.

Consider the following alternative approaches:

selectString5Q[str_String]:=
     StringFreeQ[str,{"1"~~"1"~~"1"~~"1"~~"1"~~"1"~~"1"~~"1",
    "X"~~"X"~~"X"~~"X"~~"X"~~"X",
      "2"~~"2"~~"2"~~"2"~~"2"~~"2"}];

This approach is a minor variant of your approach, using the fact that
StringFreeQ can accomodate a list of patterns for its second argument.

selectString6Q[str_String]:=
    StringCases[
        str,{"X"~~"X"~~"X"~~"X"~~"X"~~"X","2"~~"2"~~"2"~~"2"~~"2"~~"2",
          "1"~~"1"~~"1"~~"1"~~"1"~~"1"~~"1"~~"1"},1]==={};

This approach attempts to remedy the undesirable trait of the earlier
approaches by stopping after it finds the first bad run.

The two new approaches above lead to :

filter5[origList:{__String}]:=Select[origList,selectString5Q];
filter6[origList:{__String}]:=Select[origList,selectString6Q];

(Note we can get away with, eg., selectString5Q rather than
selectString5Q[#]& as the second arg of Select, though the latter does
no harm.)

Finally we use an approach analogous to the one we used for your
earlier string reduction problem.  We take your list of strings and
form one large string with appropriate delimiters, and then turn the
string pattern matching techniques loose on this large string.

filter7[origList:{__String}]:=
    StringSplit[
      StringReplace[ToString[{#}]&/@origList//StringJoin,
        StringExpression[
            "{",(WordCharacter)...,{"X"~~"X"~~"X"~~"X"~~"X"~~"X",
              "2"~~"2"~~"2"~~"2"~~"2"~~"2",

"1"~~"1"~~"1"~~"1"~~"1"~~"1"~~"1"~~"1"},(WordCharacter)...,"}"]:>
          ""],Characters["{}"]..];

ToString[{#}]&/@origList//StringJoin  turns your original list of
strings into a single string, enclosing each individual string in
brackets.  This is identical to what we did in the previous string
reduction thread. Then we run StringReplace on this long string,
essentially weeding out any of the original strings which have a bad
run.  Finally, we use StringSplit to eliminate the delimiters and give
us back a list of the remaining strings.

Comparing these three approaches along with filter4 from your message
we find:

In[8]:=
makeList[strLen_,listLen_]:=

    Table[StringJoin[{"1","2","X"}\[LeftDoubleBracket]
Table[Random[
Integer,{1,3}],{strLen}]\[RightDoubleBracket]],{listLen}];
In[9]:=
SeedRandom[1234];
egList1=makeList[14,1000];
egList2=makeList[30,1000];
egList3=makeList[30,20000];
egList4=makeList[100,20000];

In[14]:=
{Timing[Length@filter4[egList1]],Timing[Length@filter5[egList1]],
  Timing[Length@filter6[egList1]],Timing[Length@filter7[egList1]]}
Out[14]=
{{0.13 Second,981},{0.09 Second,981},{0.09 Second,981},{0.04
Second,981}}
In[15]:=
{Timing[Length@filter7[egList2]],Timing[Length@filter4[egList2]],
  Timing[Length@filter5[egList2]],Timing[Length@filter6[egList2]]}
Out[15]=
{{0.06 Second,952},{0.11 Second,952},{0.09 Second,952},{0.1
Second,952}}
In[16]:=
{Timing[Length@filter6[egList3]],Timing[Length@filter7[egList3]],
  Timing[Length@filter4[egList3]],Timing[Length@filter5[egList3]]}
Out[16]=
{{1.27 Second,18989},{0.63 Second,18989},{1.06 Second,18989},{0.94
Second,
    18989}}
In[17]:=
{Timing[Length@filter5[egList4]],Timing[Length@filter6[egList4]],
  Timing[Length@filter7[egList4]],Timing[Length@filter4[egList4]]}
Out[17]=
{{1.61 Second,16625},{1.26 Second,16625},{1.48 Second,16625},{1.31
Second,
    16625}}

(Note that I have rotated the order of appearance of the functions in
the lists.  The Timing function in Mathematica is sensitive to the
state of the Mathematica session when it is invoked.  Functions later
in the list might benefit from internal tables and the like which
Mathematica may have created in computing previous elements of the
list.  Presumably being last in the list is the most advantageous. By
rotating the placement of the functions in the list, I hope to minimize
such bias. Of course the best way would be to restart Mathematica for
each timing, with SeedRandom being used to insure the values of egList1
etc. remain the same from session to session.)

The results seems to favor filter7.  It does slip a bit on egList4, but
I think most of the time is being taken up (see below) by inserting
delimiters at the beginning and then deleting them at the end.  I
haven't really paid much attention to these two aspects, and presumably
they could be speeded up by taking advantage of special structure in
the problem, such as that all the strings in the original list have the
same length. I give that some additional thought.

(Here is the time to change the list into a string with appropriate
delimiters)

In[18]:=
Timing[ToString[{#}]&/@egList4//StringJoin;]

Out[18]=
{0.58 Second,Null}

In[21]:=
xxx=StringReplace[ToString[{#}]&/@egList4//StringJoin,
      StringExpression[
          "{",(WordCharacter)...,{"X"~~"X"~~"X"~~"X"~~"X"~~"X",
            "2"~~"2"~~"2"~~"2"~~"2"~~"2",

"1"~~"1"~~"1"~~"1"~~"1"~~"1"~~"1"~~"1"},(WordCharacter)...,"}"]:>
        ""];

(Here is the time to reverse the process and get back to a list of
remaining strings)

In[22]:=
Timing[StringSplit[xxx,Characters["{}"]..];]
Out[22]=
{0.7 Second,Null}

dkr


  • Prev by Date: V5.2 or 5.1 slower than v.5.0 for NIntegrate ?
  • Next by Date: Sundry Questions
  • Previous by thread: Re: Re: Pure Function for String Selection
  • Next by thread: Re: Pure Function for String Selection