sa-quicktrain

Summary

sa-quicktrain is a script that lets you designate specific messages to be trained using SpamAssassin. It differs from other approaches in that it works from inside the client.

To use this:

Put this script somewhere useful (~/bin? /usr/local/bin?), and chmod it appropriately.
Set up a cron file to run the script on a comfortable interval. The script is fairly lightweight, so it’s probably safe to run it every 5 minutes (and, on fast machines, even more frequently).
Create a Train folder, and within it, a Ham and a Spam folder.
File an mis-classified Ham or Spam into the appropriate folders under Train.

Whenever the script runs it will retrain all messages in Train.Spam as spam, then file them into the Spam folder. Then it will retrain all messages in Train.Ham as ham, and file them into INBOX. You will need to go back and refile retrained ham messages into the folder to which they should have been delivered.

Caveats & Notes

Right now this only works with Maildir.
It assumes your INBOX is in the normal Maildir location, and your spam folder is Spam. If these statements aren’t true you’ll likely have to edit the script (also, please let me know what your layout is so future versions of the script can accommodate your layout).
Standard Output is suppressed. Standard Error is not. If you’re running this with cron then you’ll get error messages if there are any.
There is some experimental code – commented out – that hands the newly trained messages off to procmail for delivery rather than manually filing them. This means that mail should wind up where it would have wound up if it had been correctly classified in the first place. In principle, it works. However, it has two major drawbacks:
- There is no error-checking. If the procmail pipe somehow fails the script will merrily delete the mail anyway. This can probably be fixed by checking error codes and only deleting if all was happy.
- Since procmail will see and deliver the message again, and spamassassin will see it as fairly solidly ham or spam, SA may see it as a new message and retrain, thus biasing the Bayes DB. If anyone has an idea on how to fix this, please let me know.