After training on over 1200 messages, I reset the statistics for Popfile to get a better sense of current accuracy. Since then, I’ve processed 400 more messages. The accuracy is 90%, and seems to have leveled off at that point.
90% is significantly below what other Popfile users report, especially after this much training. On the other hand, it’s an improvement over the 70-80% I get with rule-based filters. By using two sets of rule-based filters (automated and hand-crafted) after Popfile runs, and some coarse-grained server-based filters before it runs, I’ve been able to cut down the spam that actually hits my inbox to three or four a day (99% accuracy).
That’s good enough to make the effort worthwhile, with two caveats. I’m still getting occasional false positives through Popfile, which makes me hesitant to stop the laborious process of manually training it. And I wonder whether the accuracy rate will drop as the spammers change tactics.
While my personal spam nightmare may be receding, I still maintain that (1) most users with my volume of spam will opt for whitelists over such a complex, time-consuming filtering regime, and (2) my volume of spam will increasingly be the norm as email volumes continue to increase.