A quick update since my last report on my current spam filtering setup.
Last week, I received over 87,000 inbound emails. Roughly 86,000, or 99%, were spam. About 1000 messages made it through SpamSoap and my other server-based spam filters, and were delivered to my client-side email program. Of that, 20% was still spam. My client-side Bayesian spam filter caught the bulk of those, leaving only a couple a day for me to delete manually. So, I’m still seeing better than 99.9% accuracy in catching spam (but not legitimate personal messages).
One glitch did crop up last week. Some spam goes around my server-side filters by bypassing the normal Internet mail-handling protocols (the MX record, for those who know what that means). Since no legitimate external messages do that, I set the mail preferences on my hosting server to throw those messages out. A problem arose with some emails that a coworker sent to one of my email aliases. Because they were coming from someone on the same web hosting account, they didn’t have the MX record, which is only for outside mail. For about a month, those messages went into a black whole, and I didn’t even know it.
I’ve added more intelligence to the rules that throw away the non-MX mail. The co-worker’s messages now go through, although I’m still seeing some odd cases where this doesn’t seem to work. It’s only a problem for mail from inside my company to certain email addresses. So, not crippling, but annoying nonetheless.
Other than that, things are still working very well. The scary thought is that the volume of spam is still going up. It wasn’t that long ago that I marveled at getting 1 million spams a year. At this rate, I’ll get at least 5 million in 2007. And there’s no end to the growth in sight.
At 10 million spams per year, 99.9% filtering accuracy still means 25+ spams a day I have to handle manually, which is starting to get problematic again. Hopefully by that time I’ll be closer to 99.99% accuracy, but who knows. I remember just a few years ago when 80% accuracy was state of the art — today that would be worse than useless at my volume of spam. It’s an arms race, and the spammers keep ratcheting up the volumes and the deviousness.