sa-harvest

Summary

sa-harvest is a configurable script for training SpamAssassin. It combines the process from How To Train SpamAssassin with automatic generation of the whitelist based on the contents of your ham inboxes and your outbox

The goal is to let you type one simple command (sa-harvest) rather than a series of complex commands with varying flags.

Caveats

Details

In your ~/.spamassassin directory you will create 6 new files:

user_prefs.base : everything in the user_prefs except whitelist_from entries – these are auto-generated by the script by using the addressbook files and your history of recent mail

addressbook : a list of any addresses you want let through

addressbook.negative : patterns of addresses you want sacked. e.g. your own email address because you don’t want to give a free pass to any spammer who knew to forge your address as the from address. This may be overly greedy – it’s a substring match, so paypal.com might as well be paypal.com.

mail.spam : list of paths - relative to your home dir - to mailboxes you consider spam, one mailbox per mailbox, e.g.

   Maildir/.Spam/cur

mail.ham : similarly, a list of paths to mailboxes you consider ham, e.g.

   Maildir/cur
   Maildir/.family/cur

mail.sent : mailboxes you consider sent mail, e.g. Maildir/.Sent/cur

What It Does

Setup

Future Work

One Last Note

If you’re really feeling lucky you can set this up with cron. Note that it can be fairly processor intensive.

If you do that, you need to check your ham and spam when you log in so you quickly catch any mis-identification (and fix the resulting incorrect training). If you use Maildir then restricting your training to cur directories helps cut down on that problem, but it isn’t perfect, and mbox and mbx don’t have such an option.

Feedback

Please send me some: faisal@faisal.com.