Last year I posted about this, and ended up resetting my corpus as advised. At the time I had not done so in more than 8 years and was at 98.6% accuracy. Now I am at 98.1% and 4-5 obvious spams are getting through weekly. Today, for example I trained one manually and got virtually the same message hours later passed as “Good”(see top and bottom message in this log shot: https://i.imgur.com/gyP20AT.jpg
I trained it too, as I am meticulous about training every spam manually, but they persist. Small annoyance, but regular, and all are obvious spam “on sight” as in, I don’t even have to open them to know that are junk. They score 1, 11 or 27 often.
Again, I have trained a few dozen of these but several get through every week. Not a big problem just an observation. I did think that repeated training would somehow reveal a pattern in these solicitation emails that annoy me though.
Please use the Save Diagnostic Report command in the Help menu and send me the report file, as described here. It may also help to drag some of the log entries to Finder to export those messages and include them with the diagnostic report.
I don’t know whether it was blocked by your outgoing mail server, but I did not receive anything. Another option would be to send the files via private forum message.
Odd, sent again, and copied myself and did not get it. Just sent again from my Gmail.
I got it this time, thanks. The statistics are saying that over the last month it’s been 99.5% accurate.
A score of 1 is not due to the training in the corpus. It means that you had received a previous message from the same sender name or address that SpamSieve had thought was good. (Sometimes this could be from a sequence of spam messages from the same sender and you hadn’t had time to train the first one as spam yet.)
However, this time it happened with the “O’Reilly Auto Parts” message, and it looks like you did not train the earlier spam from that sender (Identifier: 7qMfVjKyenAxjxfO51Y/kA==) as spam.
I don’t know whether there are more examples like that, but not correcting mistakes can certainly mess up the corpus, too. Many of the spam messages that got through because of the corpus did so because of words that had previously appeared in messages trained as good but none trained as spam.
I had my statistics date set to the month after I reset the corpus. If I change the date to look only since January, I am seeing 99.1% as well. That’s good news
I certainly could have missed a message or two in training, but 99% of my missed spam messages are trained as spam when they get through–something I have been doing as long as I have been using this product.
In the case of the O’Reilly auto parts message I literally trained one as spam and then several hours later received another identical message that was flagged as good. I will continue to train diligently, but 4 messages in last 2 days (including another “O’Reilly Auto Parts”) that are obvious spam were marked good by SS. They too were trained as spam manually as I (almost) always do.
It’s getting the vast majority–almost all to be sure–but really obvious spams being missed remains confounding. Thanks as always for the support.
It’s unsurprising that it would think the O’Reilly messages are good if there are multiple that it thought were good and that you did not train as spam.
I suggest that you look in the log and train any green predictions from this sender as spam. In particular, search for messages JqD4cTULLC4DO1OVFGvmLw==
and 7qMfVjKyenAxjxfO51Y/kA==
. You could also search for O'Reilly
and see whether there are any older messages that need to be corrected.