Re: Using Mathematica for text mining
- To: mathgroup at smc.vnet.net
- Subject: [mg116435] Re: Using Mathematica for text mining
- From: Cameron Christiansen <cam at byu.edu>
- Date: Tue, 15 Feb 2011 06:34:07 -0500 (EST)
Thank you for the responses. It was helpful. I had given up on it, but you show that it is possible. Thanks. > On Fri, Feb 11, 2011 at 2:18 AM, Bill Rowe <readnews at sbcglobal.net> wrote: > >> On 2/10/11 at 5:20 AM, cam at byu.edu (Cameron Christiansen) wrote: >> >> >Thank you for the response. It looks like that works well to cluster >> >words in a single document together, however I'd like to cluster >> >entire documents together based on the words they contain. Is that >> >possible? >> >> Yes, it is possible. To do this you need to define a distance >> function that provides a measure of how different one file is >> from another. For example, >> >> FindClusters[filenameList, >> DistanceFunction -> (Abs[ >> Length@FindList[#1, "keyword"] - >> Length@FindList[#2, "keyword"]] &)] >> >> would group file names according to the number of occurrences of >> "keyword" in each file. >> >> >> >