|
Privacy: HOWTO: Fight Spam with SpamProbe |
|
|

By Steve Hastings
May 1, 2003
How to set up this trainable e-mail filter to eliminate false positives, work with IMAP and run as a cron job.
I get a lot of spam e-mail. These days, however, most of it doesn't go to my e-mail Inbox, because I'm filtering my e-mail with SpamProbe. SpamProbe is a spam detector; you train it to recognize what you consider to be spam. It builds databases of keywords from your e-mail messages and then uses the keyword databases to decide whether incoming e-mail messages are spam.
In this article I explain how to set up SpamProbe to intercept spam e-mails
and file them into a folder named Spam. If you prefer, you also may set it up to
delete these messages. The setup I describe enables spam checking on a per-user
basis, and users control which of their messages are considered to be spam. The
setup is completely server-based and thus works with any e-mail client. Users
need to understand only how to move messages from one mail folder to another.
Because it handles spam completely on the server, SpamProbe is great for
users who must access their mail over a slow link, such as a modem. Client-based
filters must download all the mail, spam and non-spam alike, while a
server-based filter can keep all the spam on the server.
The setup described in this article works with any trainable spam filter, not
only SpamProbe.
Why SpamProbe?
Why use SpamProbe instead of another spam filter? I argue you should you use
it because it is a Bayesian filter with some advanced features. Bayesian spam
filters work by building two databases: a database of keywords from spam e-mails
and a database of keywords from nonspam e-mails. They then analyze each new
e-mail message, comparing keywords against the two databases and estimating the
probability the message is a spam message. You train a Bayesian spam filter by
feeding spam messages to it so it can build a spam keywords database; or, you
can feed it nonspam messages so it can build a nonspam keywords database.
Whoever controls the training of the filter thus controls what that filter
considers spam.
As the filter processes incoming e-mail messages, it continues to update its
keyword databases. Each message it flags as spam also is used to update the spam
keywords database. As users feed corrections back into the system, the filter
becomes better and better at detecting spam.
Bayesian spam filters are efficient: they don't load down a server too much,
and they don't depend on a connection to an external server to access a spam
database. Once they are trained, they can block almost all spam messages, with
few or no false positives.
SpamProbe builds its database using not only single keywords but pairs of
keywords too. The word money, by itself, might not indicate spam reliably; the
phrase "make money" are a much better indicator. An ideal spam filter might use
even longer chains of words, but that would be quite expensive computationally.
SpamProbe also correctly handles e-mails and attachments in BASE64 or
quoted-printable encoding, and it has a feature for handling Asian character
sets. SpamProbe is released under the QPL, so it is free for use by anyone.
| |
|
Full Story Linux Journal |
 | |
|
|
|
 |
| "Privacy: HOWTO: Fight Spam with SpamProbe" | Login/Create an Account | 0 comments |
|
| | The comments are owned by the poster. We aren't responsible for their content. |
|
|
|
No Comments Allowed for Anonymous, please register |
|
| |
|
Login |
|
 |
|
|
|
|
· New User? · Click here to create a registered account.
|
|
|
Article Rating |
|
 |
|
|
|
|
Average Score: 1 Votes: 1

|
|
|