Mathematica 9 is now available
Services & Resources / Wolfram Forums
-----
 /
MathGroup Archive
2005
*January
*February
*March
*April
*May
*June
*July
*August
*September
*October
*November
*December
*Archive Index
*Ask about this page
*Print this page
*Give us feedback
*Sign up for the Wolfram Insider

MathGroup Archive 2005

[Date Index] [Thread Index] [Author Index]

Search the Archive

Re: Pure Function for String Selection

  • To: mathgroup at smc.vnet.net
  • Subject: [mg61017] Re: Pure Function for String Selection
  • From: "dkr" <dkrjeg at adelphia.net>
  • Date: Fri, 7 Oct 2005 03:38:08 -0400 (EDT)
  • References: <dht479$hl1$1@smc.vnet.net>
  • Sender: owner-wri-mathgroup at wolfram.com

Edson,

Consider the following 3 predicates:

In[1]:=
selectString1Q[str_String] :=
 Module[{ch=Characters[str]},ch={First[#],Length[#]}&/@Split[ch];
      Max[Last /@ Select[ch, MatchQ[#, {"X", _}] &]] < 6 &&
        Max[Last /@ Select[ch, MatchQ[#, {"2", _}] &]] <  6 &&
        Max[Last /@ Select[ch, MatchQ[#, {"1", _}] &]] <  8
 ] ;

selectString2Q[str_String] :=
     Module[{ch},
      ch =StringCases[str,y:(x_)..:>{x,StringLength[y]}];
      Max[Last /@ Select[ch, MatchQ[#, {"X", _}] &]] < 6 &&
        Max[Last /@ Select[ch, MatchQ[#, {"2", _}] &]] <  6 &&
        Max[Last /@ Select[ch, MatchQ[#, {"1", _}] &]] <  8
       ] ;

maxseq[s_String,z_String]:=Max[StringLength/@StringCases[s,z..]];
selectString3Q[str_String]:=
    maxseq[str,"1"]<8&&maxseq[str,"X"]<6&&maxseq[str,"2"]<6;

selectString1Q was used by Jens-Peer Kuska in his previous response in
this thread.  selectString2Q appeals to a pure string pattern matching
technique to define ch rather than using Characters and Split as was
done in selectString1Q.  selectString3Q is the predicate used in my
previous response in this thread.  These 3 predicates can then be used
to define the 3 filter functions:

In[5]:=
filter1[origList:{__String}]:=Select[origList,selectString1Q[#]&];
filter2[origList:{__String}]:=Select[origList,selectString2Q[#]&];
filter3[origList:{__String}]:=Select[origList,selectString3Q[#]&];

Finally define:

In[8]:=
makeList[strLen_,listLen_]:=
    Table[StringJoin[{"1","2","X"}\[LeftDoubleBracket]
          Table[Random[

Integer,{1,3}],{strLen}]\[RightDoubleBracket]],{listLen}];

makeList generates a list of listLen strings, with each string having
length strLen and consisting of a random sequence of the three
characters "1", "2", and "X".

Now let's attempt to get a very rough feel for the relative speed of
the 3 filter functions:

In[9]:=
egList1=makeList[14,1000];
egList2=makeList[30,1000];
egList3=makeList[30,20000];
egList4=makeList[100,20000];

In[13]:=
{Timing[Length@filter3[egList1]],Timing[Length@filter2[egList1]],
  Timing[Length@filter1[egList1]]}
Out[13]=
{{0.14 Second,986},{0.42 Second,986},{0.33 Second,986}}

In[14]:=
{Timing[Length@filter1[egList2]],Timing[Length@filter3[egList2]],
  Timing[Length@filter2[egList2]]}
Out[14]=
{{0.72 Second,958},{0.09 Second,958},{0.37 Second,958}}

In[15]:=
{Timing[Length@filter2[egList3]],Timing[Length@filter1[egList3]],
  Timing[Length@filter3[egList3]]}
Out[15]=
{{7.85 Second,19057},{7.41 Second,19057},{1.68 Second,19057}}

In[16]:=
{Timing[Length@filter2[egList4]],Timing[Length@filter3[egList4]],
  Timing[Length@filter1[egList4]]}
Out[16]=
{{21. Second,16645},{3.03 Second,16645},{21.11 Second,16645}}

(Note that the filter functions appear in a different order in each of
the four inputs.)

In all cases filter3 is the clear winner, with the two other filter
functions being fairly close.  The main difference between filter3 and
the other two is that it makes three separate passes on each string
with StringCases in order to determine the runs, while the other two
functions essentially generate a single list of runs for all the 3
characters combined (filter1 by making a single pass with Split and
filter 2 a single pass with StringCases), and then have to go back and
identify which runs belong to which character from this generated set.
In any event, if these timing results survive more extensive testing,
it may suggest that working with your strings directly rather than
decomposing your strings via Characters may be the way to go.  This
would be in accord with what you found in working with the string
reduction problem which you discussed in a previous thread.


  • Prev by Date: Re: Re: Interactive surface manipulation with GUIkit
  • Next by Date: Re: Re: Visualization site updates
  • Previous by thread: Re: Pure Function for String Selection
  • Next by thread: Re: Re: Pure Function for String Selection