Re: Count Ouccrence of words in a long text
- To: mathgroup at smc.vnet.net
- Subject: [mg118983] Re: Count Ouccrence of words in a long text
- From: Murray Eisenberg <murray at math.umass.edu>
- Date: Wed, 18 May 2011 07:18:26 -0400 (EDT)
Here's one approach, which I've encapsulated in a Module for convenience: wordCounts[txt_] := Module[{words,unique,counts}, words=StringCases[ToLowerCase[txt],WordCharacter..]; unique=Union[words]; counts=Count[words,#]&/@unique; Reverse@SortBy[Transpose[{unique,counts}],Last] ] (* example *) story = ExampleData[{"Text", "AliceInWonderland"}]; wordCounts[story] {{"the", 632}, {"and", 338}, {"a", 278}, {"to", 252}, {"she", 242}, {"of", 199},... If you want a nice table printout, just use TableForm: wordCounts[story] // TableForm There's at least one anomaly: the "s" at the end of possessives is split off as a separate word. On 5/17/2011 7:47 AM, Yako wrote: > Hello, > > First of all I am pretty new to Mathematica, so excuse me if this has > a simple answer. > > What I need is to be able to count the occurrence of each word of a > text and count the times each word appears on it. I know how to do > this on other languages but I am trying to achieve it with > mathematica. > > Can someone hint me the way to go? > > Thanks! > -- Murray Eisenberg murray at math.umass.edu Mathematics & Statistics Dept. Lederle Graduate Research Tower phone 413 549-1020 (H) University of Massachusetts 413 545-2859 (W) 710 North Pleasant Street fax 413 545-1801 Amherst, MA 01003-9305