MathGroup Archive: February 2011 [00265]

[Date Index] [Thread Index] [Author Index]

Re: Using Mathematica for text mining

To: mathgroup at smc.vnet.net
Subject: [mg116348] Re: Using Mathematica for text mining
From: Bill Rowe <readnews at sbcglobal.net>
Date: Fri, 11 Feb 2011 04:18:26 -0500 (EST)

On 2/10/11 at 5:20 AM, cam at byu.edu (Cameron Christiansen) wrote:

>Thank you for the response. It looks like that works well to cluster
>words in a single document together, however I'd like to cluster
>entire documents together based on the words they contain. Is that
>possible?

Yes, it is possible. To do this you need to define a distance
function that provides a measure of how different one file is
from another. For example,

FindClusters[filenameList,
  DistanceFunction -> (Abs[
      Length@FindList[#1, "keyword"] -
       Length@FindList[#2, "keyword"]] &)]

would group file names according to the number of occurrences of
"keyword" in each file.

Prev by Date: Re: k-permutations enumeration

Next by Date: Re: list manipulation

Previous by thread: Re: Using Mathematica for text mining

Next by thread: Re: Using Mathematica for text mining