MathGroup Archive 2011

[Date Index] [Thread Index] [Author Index]

Search the Archive

Re: Count Ouccrence of words in a long text

  • To: mathgroup at smc.vnet.net
  • Subject: [mg119021] Re: Count Ouccrence of words in a long text
  • From: Murray Eisenberg <murray at math.umass.edu>
  • Date: Fri, 20 May 2011 06:35:13 -0400 (EDT)

Use function Characters.

On 5/19/2011 7:43 AM, Matthias Bode wrote:
> Hola:
>
> Second step:
>
> How could I take apart the words ("alice" should become "a,l,i,c,e") to get a tally of the letters in a text? (Interesting when comparing languages.)
>
> Best regards,
>
> MATTHIAS BODE.
>
>
>> Here's one approach, which I've encapsulated in a Module for convenience:
>>
>>     wordCounts[txt_] :=
>>       Module[{words,unique,counts},
>>         words=StringCases[ToLowerCase[txt],WordCharacter..];
>>         unique=Union[words];
>>         counts=Count[words,#]&/@unique;
>>         Reverse@SortBy[Transpose[{unique,counts}],Last]
>>     ]
>>
>>     (* example *)
>>     story = ExampleData[{"Text", "AliceInWonderland"}];
>>     wordCounts[story]
>>
>> {{"the", 632}, {"and", 338}, {"a", 278}, {"to", 252}, {"she",
>>     242}, {"of", 199},...
>>
>> If you want a nice table printout, just use TableForm:
>>
>>      wordCounts[story] // TableForm
>>
>> There's at least one anomaly: the "s" at the end of possessives is split
>> off as a separate word.
>>
>> On 5/17/2011 7:47 AM, Yako wrote:
>>> Hello,
>>>
>>> First of all I am pretty new to Mathematica, so excuse me if this has
>>> a simple answer.
>>>
>>> What I need is to be able to count the occurrence of each word of a
>>> text and count the times each word appears on it. I know how to do
>>> this on other languages but I am trying to achieve it with
>>> mathematica.
>>>
>>> Can someone hint me the way to go?
>>>
>>> Thanks!
>>>
>>
>> --
>> Murray Eisenberg                     murray at math.umass.edu
>> Mathematics&  Statistics Dept.
>> Lederle Graduate Research Tower      phone 413 549-1020 (H)
>> University of Massachusetts                413 545-2859 (W)
>> 710 North Pleasant Street            fax   413 545-1801
>> Amherst, MA 01003-9305
>>
>

-- 
Murray Eisenberg                     murray at math.umass.edu
Mathematics & Statistics Dept.
Lederle Graduate Research Tower      phone 413 549-1020 (H)
University of Massachusetts                413 545-2859 (W)
710 North Pleasant Street            fax   413 545-1801
Amherst, MA 01003-9305


  • Prev by Date: Question about DurbinWatsonD
  • Next by Date: Re: How To Break Long List to m-by-n MatrixForm
  • Previous by thread: Re: Count Ouccrence of words in a long text
  • Next by thread: Re: Count Ouccrence of words in a long text