MathGroup Archive 2011

[Date Index] [Thread Index] [Author Index]

Search the Archive

Re: Using Mathematica for text mining

  • To: mathgroup at smc.vnet.net
  • Subject: [mg116435] Re: Using Mathematica for text mining
  • From: Cameron Christiansen <cam at byu.edu>
  • Date: Tue, 15 Feb 2011 06:34:07 -0500 (EST)

Thank you for the responses. It was helpful. I had given up on it, but you
show that it is possible. Thanks.


> On Fri, Feb 11, 2011 at 2:18 AM, Bill Rowe <readnews at sbcglobal.net> wrote:
>
>> On 2/10/11 at 5:20 AM, cam at byu.edu (Cameron Christiansen) wrote:
>>
>> >Thank you for the response. It looks like that works well to cluster
>> >words in a single document together, however I'd like to cluster
>> >entire documents together based on the words they contain. Is that
>> >possible?
>>
>> Yes, it is possible. To do this you need to define a distance
>> function that provides a measure of how different one file is
>> from another. For example,
>>
>> FindClusters[filenameList,
>>  DistanceFunction -> (Abs[
>>      Length@FindList[#1, "keyword"] -
>>       Length@FindList[#2, "keyword"]] &)]
>>
>> would group file names according to the number of occurrences of
>> "keyword" in each file.
>>
>>
>>
>



  • Prev by Date: How to do quickest
  • Next by Date: Re: Mathematica: subscript simplification under non-communicative multiplication.
  • Previous by thread: Re: Using Mathematica for text mining
  • Next by thread: Polygon projection in CountryData incorrect?