Re: Using Mathematica for text mining
- To: mathgroup at smc.vnet.net
- Subject: [mg116348] Re: Using Mathematica for text mining
- From: Bill Rowe <readnews at sbcglobal.net>
- Date: Fri, 11 Feb 2011 04:18:26 -0500 (EST)
On 2/10/11 at 5:20 AM, cam at byu.edu (Cameron Christiansen) wrote: >Thank you for the response. It looks like that works well to cluster >words in a single document together, however I'd like to cluster >entire documents together based on the words they contain. Is that >possible? Yes, it is possible. To do this you need to define a distance function that provides a measure of how different one file is from another. For example, FindClusters[filenameList, DistanceFunction -> (Abs[ Length@FindList[#1, "keyword"] - Length@FindList[#2, "keyword"]] &)] would group file names according to the number of occurrences of "keyword" in each file.