Type of Spam That Almost Always Gets Through

After 10 years of use and careful training SpamSieve has a 98.6 accuracy rate still. But I think I have found its “spam kryptonite”. It appears that grammatical English, standard phrasing and conventionally formatted emails offering my business “credit, lines of credit, financing, or loans” all get through with a score of 27 (or 5, like this one) and are almost always “Predicted Good”.
I have trained a few dozen of these but one or two still gets through every week. Not a big problem just an observation. I did think that repeated training would somehow reveal a pattern in these solicitation emails that annoy me though.

That’s a low accuracy rate for SpamSieve in general, and if you’ve been training for 10 years there’s likely a lot of information that’s now out-of-date. It would probably help to choose Reset Corpus from the File menu, then re-train SpamSieve with a smaller number of recent messages.

1 Like

It’s been right about 98.6% for years. No doubt old stuff in corpus, but I guess I can retrain. I will need to save up some spam for a few days to get to the right training level percentage of 65/35 as I tend to trash spam fast, and had been set to auto-delete any spam I train manually.

Well I took the plunge. I had treated my 10-year old corpus like delicate sourdough starter and the thought of resetting/discarding it was scary. I did. Retrained with 65 spam and 35 good messages and I am on my way again! New stats show 99.1% accuracy.

2 Likes

Ah, good old Bayesian. A perfect example of the impact of prior odds (as I now realise)

I spoke too soon. Accuracy is falling steadily, now down to 96%. Obvious spam is being labeled Good, and False Negatives are way up since I started with a new corpus and did an initial training with100 current messages 65/35% good vs spam%). Maybe this will get better, but it is currently worse. Many scoring"27" Predicted Good that are obviously spam, several scoring “0” that are as well. My slider is 5 marks from the right, slightly toward “Aggressive”

Zero means that either you trained that exact message as good or that the sender is in your Contacts. If you use the Save Diagnostic Report command in the Help menu and send me the report file, as described here, I can investigate further.

1 Like

Sent, thanks (along with a couple of log screenshots showing what seem to be errors).