MathGroup Archive 2009

[Date Index] [Thread Index] [Author Index]

Search the Archive

Re: Number of Words in a String

  • To: mathgroup at smc.vnet.net
  • Subject: [mg102565] Re: Number of Words in a String
  • From: ADL <alberto.dilullo at tiscali.it>
  • Date: Fri, 14 Aug 2009 05:59:14 -0400 (EDT)
  • References: <h5r8b5$l4q$1@smc.vnet.net>

With respect to the use of regular expressions or Mathematica string
patterns, note that the former are faster 20-30%:

In[1]:= NN = 1000000;

For word splitting:

In[2]:= StringSplit["The cat in a hat, (not on the mat)??.",
RegularExpression["[^A-Za-z]+"]]
Out[2]= {The,cat,in,a,hat,not,on,the,mat}

In[3]:= Timing[
 Do[StringSplit["The cat in a hat, (not on the mat)??.",
RegularExpression["[^A-Za-z]+"]];, {NN}]]
Out[3]= {9.999,Null}

In[4]:= StringSplit["The cat in a hat, (not on the mat)??.", Except
[WordCharacter] ..]
Out[4]= {The,cat,in,a,hat,not,on,the,mat}

In[5]:= Timing[
 Do[StringSplit["The cat in a hat, (not on the mat)??.", Except
[WordCharacter] ..];, {NN}]]
Out[5]= {12.808,Null}

so: 22% faster with regex.


For word counting:

In[6]:= StringCount["The cat in a hat, (not on the mat)??.",
RegularExpression["[A-Za-z]+"]]
Out[6]= 9

In[7]:= Timing[
 Do[StringCount["The cat in a hat, (not on the mat)??.",
RegularExpression["[A-Za-z]+"]];, {NN}]]
Out[7]= {6.396,Null}

In[8]:= StringCount["The cat in a hat, (not on the mat)??.",
WordCharacter ..]
Out[8]= 9

In[9]:= Timing[
 Do[StringCount["The cat in a hat, (not on the mat)??.",
WordCharacter ..];, {NN}]]
Out[9]= {9.438,Null}

so, 32% faster with regex.


ADL


On Aug 11, 9:58 am, Gregory Lypny <gregory.ly... at videotron.ca> wrote:
> Hello everyone,
>
> Is this the simplest way to find the number of words in a string?  
> Seems a little complicated, and I can't seem to turn it into a  
> function because when I replace the string with the argument  
> placeholder myString_ I get an error message saying that a string is  
> expected in that spot.
>
>         Length[ReadList[StringToStream["The cat in the hat."], Wo=
rd]]
>
>         Returns 5.
>
> Gregory



  • Prev by Date: ListLogLinearPlot with two y-axis
  • Next by Date: Re: video on Presentations by Williams and Park
  • Previous by thread: Re: Number of Words in a String
  • Next by thread: Bug: ListPlot and Tooltip