Re: Number of Words in a String
- To: mathgroup at smc.vnet.net
- Subject: [mg102565] Re: Number of Words in a String
- From: ADL <alberto.dilullo at tiscali.it>
- Date: Fri, 14 Aug 2009 05:59:14 -0400 (EDT)
- References: <h5r8b5$l4q$1@smc.vnet.net>
With respect to the use of regular expressions or Mathematica string
patterns, note that the former are faster 20-30%:
In[1]:= NN = 1000000;
For word splitting:
In[2]:= StringSplit["The cat in a hat, (not on the mat)??.",
RegularExpression["[^A-Za-z]+"]]
Out[2]= {The,cat,in,a,hat,not,on,the,mat}
In[3]:= Timing[
Do[StringSplit["The cat in a hat, (not on the mat)??.",
RegularExpression["[^A-Za-z]+"]];, {NN}]]
Out[3]= {9.999,Null}
In[4]:= StringSplit["The cat in a hat, (not on the mat)??.", Except
[WordCharacter] ..]
Out[4]= {The,cat,in,a,hat,not,on,the,mat}
In[5]:= Timing[
Do[StringSplit["The cat in a hat, (not on the mat)??.", Except
[WordCharacter] ..];, {NN}]]
Out[5]= {12.808,Null}
so: 22% faster with regex.
For word counting:
In[6]:= StringCount["The cat in a hat, (not on the mat)??.",
RegularExpression["[A-Za-z]+"]]
Out[6]= 9
In[7]:= Timing[
Do[StringCount["The cat in a hat, (not on the mat)??.",
RegularExpression["[A-Za-z]+"]];, {NN}]]
Out[7]= {6.396,Null}
In[8]:= StringCount["The cat in a hat, (not on the mat)??.",
WordCharacter ..]
Out[8]= 9
In[9]:= Timing[
Do[StringCount["The cat in a hat, (not on the mat)??.",
WordCharacter ..];, {NN}]]
Out[9]= {9.438,Null}
so, 32% faster with regex.
ADL
On Aug 11, 9:58 am, Gregory Lypny <gregory.ly... at videotron.ca> wrote:
> Hello everyone,
>
> Is this the simplest way to find the number of words in a string?
> Seems a little complicated, and I can't seem to turn it into a
> function because when I replace the string with the argument
> placeholder myString_ I get an error message saying that a string is
> expected in that spot.
>
> Length[ReadList[StringToStream["The cat in the hat."], Wo=
rd]]
>
> Returns 5.
>
> Gregory