MathGroup Archive 2004

[Date Index] [Thread Index] [Author Index]

Search the Archive

Re: MathGroup /: Descriptive headings

  • To: mathgroup at smc.vnet.net
  • Subject: [mg52015] Re: MathGroup /: Descriptive headings
  • From: Bill Rowe <readnewsciv at earthlink.net>
  • Date: Sun, 7 Nov 2004 01:03:37 -0500 (EST)
  • Sender: owner-wri-mathgroup at wolfram.com

On 11/6/04 at 2:07 AM, steve at smc.vnet.net (Steven M. Christensen)
wrote:

>Another idea would be for someone clever to write a script that
>could categorize a post.   For example, all words in a post could
>be extracted to a list and then compared to a list of categories
>and those categories that that fit could be chosen and put on say
>the top line of the post to help with filtering. Some posts might
>not be easy to treat in this way, but it might help.

The essence of such a script already exists.

Take a look a POPFile <http://popfile.sourceforge.net/>

Basically, this computes the probability a message fits within a given catagory. The algortihm depends on training. That is, it builds a database from messages you clasify. As more messages get classified, the alogrithm becomes more accurate about classifying messages leaving fewer for you to classify. In addition to enabling a automated way to add a descriptive word to the subject line, POPFile should greatly help good messages from spam.

While I haven't used POPFile, I do use SpamSieve a Mac application based on the same algorithms as POPFile. And I have found SpamSieve to be a very effective spam solution. For example out of the last 14,000 messages SpamSieve as filtered, it incorrectly identified 7 good messages as spam and incrorrectly identified 70 spam messages as good for an accuracy rate of 99.5%. I assume similar results would be obtained with POPFile.

As for the idea of adding descriptive identifiers, I've little interest one way or the other. The addition of descriptive identifiers would make it easier to filter groups of messages but I doubt I would change my current method of reading messages posted to MathGroup to take advantage of this.

I currently read messages with an email client and much prefer the mail list to newsgroups or a web interface. Your scripts to add the [mgXXXX] have already broken threading in my email client but does make filtering the messages to a specific mail box simpler.

Adding a descriptive identifier to the subject line particularly if identified with [], I have a script which strips the [mgXXXX] from the subject line as well as things like Re: Re: Re:. It would be trivial from me to modify the script to strip more from the subject line if I found it useful to do so.
--
To reply via email subtract one hundred and four


  • Prev by Date: Re: NMaximize woes
  • Next by Date: Re: need help with integration
  • Previous by thread: Re: MathGroup /: Descriptive headings
  • Next by thread: Re: MathGroup /: Descriptive headings