Re: Using Mathematica for text mining
- To: mathgroup at smc.vnet.net
- Subject: [mg116348] Re: Using Mathematica for text mining
- From: Bill Rowe <readnews at sbcglobal.net>
- Date: Fri, 11 Feb 2011 04:18:26 -0500 (EST)
On 2/10/11 at 5:20 AM, cam at byu.edu (Cameron Christiansen) wrote:
>Thank you for the response. It looks like that works well to cluster
>words in a single document together, however I'd like to cluster
>entire documents together based on the words they contain. Is that
>possible?
Yes, it is possible. To do this you need to define a distance
function that provides a measure of how different one file is
from another. For example,
FindClusters[filenameList,
DistanceFunction -> (Abs[
Length@FindList[#1, "keyword"] -
Length@FindList[#2, "keyword"]] &)]
would group file names according to the number of occurrences of
"keyword" in each file.