SpamSieve Filters, Logs, Weirdness

moose · May 17, 2024, 8:14pm

Hey there Michael (wow, it’s really awesome that you respond to all of these!)

So I’ve noticed an issue with SpamSieve. Overall, I love it a lot … it’d generally pretty good at picking up spam, to the point that it’s won over my neuroticism (I think one of my last posts was about constantly double checking that it wasn’t sending good messages to the spam mailbox.

Anywho, once in a while, a REALLY obvious spam message will get through to the inbox. SS will predict it to be good, and then, as with spam, within seconds/minutes, that same sender will send a lot more messages which seems to reinforce the “good prediction.” So I’ll suddenly have 3+ spam messages in my inbox. I train as spam and move on, but I’m trying to figure out why those are being predicted as good. I’m fairly diligent with training as spam as soon as I notice (I don’t clear out my day without training all new emails that are spam). Yet multiple times per week, this will happen. The curious thing is those emails usually have a score of 27 exactly (there’s a part of me that just wants to say “all emails with a 27 score are spam haha”). I’m also not certain why it’s reprocessing the messages.

I’ve attached some images … I’d apply more, but the Log only sorts by “Date” for me and not “Type”, so I can’t filter down to just the ones predicted good at 27. The first example is interesting because it correctly identified a similar message as spam just the day before

This text will be hidden

Michael_Tsai · May 17, 2024, 8:57pm

I’m working on a way to prevent that, but of course the main problem is if it gets the initial message wrong.

In that case you may to want to set it to be slightly more aggressive, to move these across the borderline.

In the first message, there were some very unspammy parts of some URLs and an image. Some of this will probably be addressed with some changes that I’m working on.

There is also what appears to be bad luck, e.g. the word “Secure” in the subject was apparently almost never in your spam messages (until now). And same with the word “edward”.

moose · May 17, 2024, 9:05pm

Cool. Will do, just wanted to ensure I wasn’t missing something obvious before messing around with the Bayesian classifier. Thanks!

Ah … so that’s how I interpret the log. Very cool. Good to know. Thanks for that as well!

Hope you have a lovely day!

Michael_Tsai · May 18, 2024, 1:31am

Yes, it’s really important to make sure that the mistakes have all been corrected first, but it sounds like you’ve been doing that.