MathGroup Archive 2011

[Date Index] [Thread Index] [Author Index]

Search the Archive

Re: Count Ouccrence of words in a long text

  • To: mathgroup at smc.vnet.net
  • Subject: [mg118983] Re: Count Ouccrence of words in a long text
  • From: Murray Eisenberg <murray at math.umass.edu>
  • Date: Wed, 18 May 2011 07:18:26 -0400 (EDT)

Here's one approach, which I've encapsulated in a Module for convenience:

   wordCounts[txt_] :=
     Module[{words,unique,counts},
       words=StringCases[ToLowerCase[txt],WordCharacter..];
       unique=Union[words];
       counts=Count[words,#]&/@unique;
       Reverse@SortBy[Transpose[{unique,counts}],Last]
   ]

   (* example *)
   story = ExampleData[{"Text", "AliceInWonderland"}];
   wordCounts[story]

{{"the", 632}, {"and", 338}, {"a", 278}, {"to", 252}, {"she",
   242}, {"of", 199},...

If you want a nice table printout, just use TableForm:

    wordCounts[story] // TableForm

There's at least one anomaly: the "s" at the end of possessives is split 
off as a separate word.

On 5/17/2011 7:47 AM, Yako wrote:
> Hello,
>
> First of all I am pretty new to Mathematica, so excuse me if this has
> a simple answer.
>
> What I need is to be able to count the occurrence of each word of a
> text and count the times each word appears on it. I know how to do
> this on other languages but I am trying to achieve it with
> mathematica.
>
> Can someone hint me the way to go?
>
> Thanks!
>

-- 
Murray Eisenberg                     murray at math.umass.edu
Mathematics & Statistics Dept.
Lederle Graduate Research Tower      phone 413 549-1020 (H)
University of Massachusetts                413 545-2859 (W)
710 North Pleasant Street            fax   413 545-1801
Amherst, MA 01003-9305


  • Prev by Date: Re: slow interface for Mathematica 8.01
  • Next by Date: Re: Get["file.mx"] doesn't work
  • Previous by thread: Re: Count Ouccrence of words in a long text
  • Next by thread: Re: Count Ouccrence of words in a long text