Re: Number of Words in a String
- To: mathgroup at smc.vnet.net
- Subject: [mg102565] Re: Number of Words in a String
- From: ADL <alberto.dilullo at tiscali.it>
- Date: Fri, 14 Aug 2009 05:59:14 -0400 (EDT)
- References: <h5r8b5$l4q$1@smc.vnet.net>
With respect to the use of regular expressions or Mathematica string patterns, note that the former are faster 20-30%: In[1]:= NN = 1000000; For word splitting: In[2]:= StringSplit["The cat in a hat, (not on the mat)??.", RegularExpression["[^A-Za-z]+"]] Out[2]= {The,cat,in,a,hat,not,on,the,mat} In[3]:= Timing[ Do[StringSplit["The cat in a hat, (not on the mat)??.", RegularExpression["[^A-Za-z]+"]];, {NN}]] Out[3]= {9.999,Null} In[4]:= StringSplit["The cat in a hat, (not on the mat)??.", Except [WordCharacter] ..] Out[4]= {The,cat,in,a,hat,not,on,the,mat} In[5]:= Timing[ Do[StringSplit["The cat in a hat, (not on the mat)??.", Except [WordCharacter] ..];, {NN}]] Out[5]= {12.808,Null} so: 22% faster with regex. For word counting: In[6]:= StringCount["The cat in a hat, (not on the mat)??.", RegularExpression["[A-Za-z]+"]] Out[6]= 9 In[7]:= Timing[ Do[StringCount["The cat in a hat, (not on the mat)??.", RegularExpression["[A-Za-z]+"]];, {NN}]] Out[7]= {6.396,Null} In[8]:= StringCount["The cat in a hat, (not on the mat)??.", WordCharacter ..] Out[8]= 9 In[9]:= Timing[ Do[StringCount["The cat in a hat, (not on the mat)??.", WordCharacter ..];, {NN}]] Out[9]= {9.438,Null} so, 32% faster with regex. ADL On Aug 11, 9:58 am, Gregory Lypny <gregory.ly... at videotron.ca> wrote: > Hello everyone, > > Is this the simplest way to find the number of words in a string? > Seems a little complicated, and I can't seem to turn it into a > function because when I replace the string with the argument > placeholder myString_ I get an error message saying that a string is > expected in that spot. > > Length[ReadList[StringToStream["The cat in the hat."], Wo= rd]] > > Returns 5. > > Gregory