This is a placeholder for the Phisher project. Everything here may move.
Phisher is a SpamAssassin plugin which looks for anchors whose text resembles a domain name but whose href does not match the text.
For example, these would be caught:
The function also does some normalization of urls and domains, so some similar matches should not be caught. For example:
Feeble instructions for using it are in the header. If you are not comfortable screwing around with your SA config, or if you do not have access to the site-wide config files (local.cf), you probably won’t be able to use this yet.
Why This Is A Bad Idea
This approach has been suggested before, usually as a regexp. Some people don’t like the general approach because it can lead to false positives:
My opinion is that this is more a matter of setting appropriate scores, and letting the presence of the mismatched anchor inform SA, than a matter of not wanting to use it because it might be wrong (as, in fact, many SA rules FP all the time). Further, I think you can’t implement this as a single line regexp because the string normalization becomes too hairy, and the pattern will break down all over the place. I tried it that way at first and it was a mess.
- Validate on SA 3.1.x after 3.1 (which works)
- Replace regexp with HTML::Parser based matching using uri_anchor_text.
- Test with larger corpus (volunteers? Bueller?)
- Figure out what a good target score should be (or let the scoring system figure it out?)
- Edge cases
- what happens if there are nested html tags
- what happens if there are improperly nested html tags (<a><a></a>)
- what happens if there are spaces in the href
- what happens if there are spaces in the visible urls and %20s (etc) in the href
- Get more people testing, and then, depending on feedback, either
- Submit to SA for inclusion in upcoming version, or
- Repackage as real Perl module so people can easily install it, and run it alongside SA