MathGroup Archive 2011

[Date Index] [Thread Index] [Author Index]

Search the Archive

Re: Using Mathematica for text mining

  • To: mathgroup at smc.vnet.net
  • Subject: [mg116348] Re: Using Mathematica for text mining
  • From: Bill Rowe <readnews at sbcglobal.net>
  • Date: Fri, 11 Feb 2011 04:18:26 -0500 (EST)

On 2/10/11 at 5:20 AM, cam at byu.edu (Cameron Christiansen) wrote:

>Thank you for the response. It looks like that works well to cluster
>words in a single document together, however I'd like to cluster
>entire documents together based on the words they contain. Is that
>possible?

Yes, it is possible. To do this you need to define a distance
function that provides a measure of how different one file is
from another. For example,

FindClusters[filenameList,
  DistanceFunction -> (Abs[
      Length@FindList[#1, "keyword"] -
       Length@FindList[#2, "keyword"]] &)]

would group file names according to the number of occurrences of
"keyword" in each file.



  • Prev by Date: Re: k-permutations enumeration
  • Next by Date: Re: list manipulation
  • Previous by thread: Re: Using Mathematica for text mining
  • Next by thread: Re: Using Mathematica for text mining