MathGroup Archive 2011

[Date Index] [Thread Index] [Author Index]

Search the Archive

Re: Count Ouccrence of words in a long text

  • To: mathgroup at smc.vnet.net
  • Subject: [mg119015] Re: Count Ouccrence of words in a long text
  • From: Bob Hanlon <hanlonr at cox.net>
  • Date: Fri, 20 May 2011 06:34:08 -0400 (EDT)

Characters["Alice"]

{"A", "l", "i", "c", "e"}


Bob Hanlon

---- Matthias Bode <lvsaba at hotmail.com> wrote: 

=============
Hola:

Second step:

How could I take apart the words ("alice" should become "a,l,i,c,e") to get a tally of the letters in a text? (Interesting when comparing languages.)

Best regards,

MATTHIAS BODE.


> Here's one approach, which I've encapsulated in a Module for convenience:
>
>    wordCounts[txt_] :=
>      Module[{words,unique,counts},
>        words=StringCases[ToLowerCase[txt],WordCharacter..];
>        unique=Union[words];
>        counts=Count[words,#]&/@unique;
>        Reverse@SortBy[Transpose[{unique,counts}],Last]
>    ]
>
>    (* example *)
>    story = ExampleData[{"Text", "AliceInWonderland"}];
>    wordCounts[story]
>
> {{"the", 632}, {"and", 338}, {"a", 278}, {"to", 252}, {"she",
>    242}, {"of", 199},...
>
> If you want a nice table printout, just use TableForm:
>
>     wordCounts[story] // TableForm
>
> There's at least one anomaly: the "s" at the end of possessives is split
> off as a separate word.
>
> On 5/17/2011 7:47 AM, Yako wrote:
> > Hello,
> >
> > First of all I am pretty new to Mathematica, so excuse me if this has
> > a simple answer.
> >
> > What I need is to be able to count the occurrence of each word of a
> > text and count the times each word appears on it. I know how to do
> > this on other languages but I am trying to achieve it with
> > mathematica.
> >
> > Can someone hint me the way to go?
> >
> > Thanks!
> >
>
> --
> Murray Eisenberg                     murray at math.umass.edu
> Mathematics & Statistics Dept.
> Lederle Graduate Research Tower      phone 413 549-1020 (H)
> University of Massachusetts                413 545-2859 (W)
> 710 North Pleasant Street            fax   413 545-1801
> Amherst, MA 01003-9305
>



  • Prev by Date: Re: formation of lowering operator and raising operator
  • Next by Date: Re: find missing numbers in a series
  • Previous by thread: Re: Count Ouccrence of words in a long text
  • Next by thread: Re: Count Ouccrence of words in a long text