Re: Count Ouccrence of words in a long text
- To: mathgroup at smc.vnet.net
- Subject: [mg119021] Re: Count Ouccrence of words in a long text
- From: Murray Eisenberg <murray at math.umass.edu>
- Date: Fri, 20 May 2011 06:35:13 -0400 (EDT)
Use function Characters. On 5/19/2011 7:43 AM, Matthias Bode wrote: > Hola: > > Second step: > > How could I take apart the words ("alice" should become "a,l,i,c,e") to get a tally of the letters in a text? (Interesting when comparing languages.) > > Best regards, > > MATTHIAS BODE. > > >> Here's one approach, which I've encapsulated in a Module for convenience: >> >> wordCounts[txt_] := >> Module[{words,unique,counts}, >> words=StringCases[ToLowerCase[txt],WordCharacter..]; >> unique=Union[words]; >> counts=Count[words,#]&/@unique; >> Reverse@SortBy[Transpose[{unique,counts}],Last] >> ] >> >> (* example *) >> story = ExampleData[{"Text", "AliceInWonderland"}]; >> wordCounts[story] >> >> {{"the", 632}, {"and", 338}, {"a", 278}, {"to", 252}, {"she", >> 242}, {"of", 199},... >> >> If you want a nice table printout, just use TableForm: >> >> wordCounts[story] // TableForm >> >> There's at least one anomaly: the "s" at the end of possessives is split >> off as a separate word. >> >> On 5/17/2011 7:47 AM, Yako wrote: >>> Hello, >>> >>> First of all I am pretty new to Mathematica, so excuse me if this has >>> a simple answer. >>> >>> What I need is to be able to count the occurrence of each word of a >>> text and count the times each word appears on it. I know how to do >>> this on other languages but I am trying to achieve it with >>> mathematica. >>> >>> Can someone hint me the way to go? >>> >>> Thanks! >>> >> >> -- >> Murray Eisenberg murray at math.umass.edu >> Mathematics& Statistics Dept. >> Lederle Graduate Research Tower phone 413 549-1020 (H) >> University of Massachusetts 413 545-2859 (W) >> 710 North Pleasant Street fax 413 545-1801 >> Amherst, MA 01003-9305 >> > -- Murray Eisenberg murray at math.umass.edu Mathematics & Statistics Dept. Lederle Graduate Research Tower phone 413 549-1020 (H) University of Massachusetts 413 545-2859 (W) 710 North Pleasant Street fax 413 545-1801 Amherst, MA 01003-9305