Training? What training? Useless effort

MacMyDay · June 11, 2008, 10:46am

For the last month or two, I seem to be missing email sent from my brother’s email address who is in my Mac Address Book. Spam Sieve sends his mail to the SPAM. No worries, I find the email in the spam folder and use SS’s “TRAIN AS GOOD” command. Figuring this would do the trick. I’ve trained each of 5 emails received over two days just this weekend. Today he calls me and asks if I’ve seen his email. Of course not, cause SS is tossing it into the SPAM category.

So I’ve concluded that the TRAINING commands in SS are just some sort of placebo making one feel better with the illusion that one is actually affecting the behaviour of SS.

Not sure why this is happening but it’s alarming enough to toss SS itself into the SPAM folder and move on with life and another solution.

anyone seen this before?

Michael_Tsai · June 11, 2008, 10:57am

If the sender address is in Address Book and you have Use Mac OS X Address Book checked in the preferences, SpamSieve will never put the messages in the spam mailbox. You can look in SpamSieve’s log to see whether it predicted those messages to be spam. My guess is that it didn’t, in which case the problem lies elsewhere.

For example, another junk filter (on your mail server or locally) could be moving the messages, as could a rule that you created. Which mail program are you using? More information is essential in order to determine where the problem is.

Or maybe training SpamSieve didn’t help because SpamSieve wasn’t what was putting those messages in the spam mailbox.

MacMyDay · June 11, 2008, 11:29am

I’ve attached screen shots that show junk filtering Mail.app in 10.5.3 to be inactive and Address book active in SS.

For example, another junk filter (on your mail server or locally) could be moving the messages, as could a rule that you created. Which mail program are you using? More information is essential in order to determine where the problem is.

Mail.app and again I’ve disabled JUNK filtering. The address is .Mac and even if .Mac was doing server side filtering, it would filter prior to putting into the SPAM mail box I defined when setting up SpamSieve. So this cannot be the problem.

…] maybe training SpamSieve didn’t help because SpamSieve wasn’t what was putting those messages in the spam mailbox.

Unfortunatley I’d like to think this was the case, but fact is the mail from the address I’ve “trained” in SS more than a dozen times continually gets filed (by SS) in the SPAM folder On My Mac. It seems that the training in SS is really not working or for some reason is ignoring my relentless, yet useless, effort in training it to let my bro’s mail to come through - clean and simple - without marking it as SPAM.

Michael_Tsai · June 11, 2008, 11:48am

You keep stating that, but you haven’t shown any evidence that SpamSieve is what’s doing this. Does the log file say that SpamSieve predicted the message to be spam? (If you don’t understand how to read the log file, please send it to me.)

Also, if you e-mail me me your rules file:

/Users/<username>/Library/Mail/MessageRules.plist

I can check your setup in Mail.

The training is irrelevant in this case, because you said the address was in your address book. Even if you trained similar messages as spam, SpamSieve would still predict future ones to be good, because of the address book.

MacMyDay · June 11, 2008, 12:26pm

Yes. The email was predicted as SPAM:

Predicted: Spam (100)
Subject: Toinight
From: name@domain.com
Identifier: c+IZX8/ywtoc5BY1KewOFA==
Reason: has encoded HTML part
Date: 2008-06-09 07:11:19 -0700

Also, if you e-mail me me your rules file

Okay. On its way to you

The training is irrelevant in this case, because you said the address was in your address book. Even if you trained similar messages as spam, SpamSieve would still predict future ones to be good, because of the address book.

Actually Michael, training is relevant because I just discovered that my brother has two emails (or perhaps one plus an alias: one with a middle initial separated by periods and first and last name firstname.middleinitial.lastname@domain.com The other simply firstname.lastname@domain.com. It appears that the email with middle initial was NOT in my address book.

Not withstanding the training should have forgiven the SPAM detection and let the email stand as good as that is how I trained it. And if an email that is trained as good doesn’t get marked as good (false positive) then what good is it to train and how many others are slipping through?

And as for evidence that SS is filing the email, please note that I have no rules, nor other SPAM filter and the only mail that is filed in the SPAM folder is the mail that SS puts there. I can’t be anyone or anything else.

thanks for your help!

Michael_Tsai · June 11, 2008, 12:55pm

This shows that SpamSieve looked at the message and thought it was spam because it contained an encoded HTML part.

Many spammers encode the contents of their messages with base-64 so that filters cannot see the incriminating words they contain. SpamSieve can decode and look inside these messages. The “Encoded HTML mail is spam” option causes it to mark all such as spam, regardless of their contents, on the theory that legitimate senders do not try to obscure their messages. In practice, this is usually a safe assumption, and the low priority position on the filters list means that encoded messages will not be classified as spam if the sender is in the address book or if the message matches the whitelist. However, if, for whatever reason, you expect to receive encoded messages that are not spam, you can uncheck “Encoded HTML mail is spam” in order to prevent SpamSieve from classifying them as spam on that basis.

OK, that explains why Use Mac OS X Address Book didn’t help. You should add this address to his card in Address Book.

Since the address wasn’t in the address book, the training becomes relevant. The normal SpamSieve behavior is that when you train a message as good, it adds the sender address to the whitelist. From then on, SpamSieve will predict all future messages from that address to be good.

In order for this to work, Use SpamSieve whitelist and Train SpamSieve whitelist must be selected in the preferences (which they are, in the default configuration). At this point, my best guess is that you received a spam message from your brother’s address, trained it as spam, and this disabled the whitelist rule. However, in order to know for sure I would need you to send me the log file:

/Users/<username>/Library/Logs/SpamSieve/SpamSieve Log.log

Ordinarily, this alone wouldn’t cause SpamSieve to think the messages from your brother were spam, because it would learn to recognize them as good by their contents. However, as noted above, in the default configuration SpamSieve will classify messages with encoded HTML parts as spam, regardless of their contents. This won’t be a problem in the future if you uncheck “Encoded HTML mail is spam.”

Thanks for sending the MessageRules.plist file. Please note that the installation instructions specifically recommend that you call your spam mailbox “Spam” rather than “SPAM”, although that’s unrelated to the specific problem discussed above.

MacMyDay · June 11, 2008, 2:12pm

Michael_Tsai:

This shows that SpamSieve looked at the message and thought it was spam because it contained an encoded HTML part.

Many spammers encode the contents of their messages with base-64 so that filters cannot see the incriminating words they contain. SpamSieve can decode and look inside these messages. The “Encoded HTML mail is spam” option causes it to mark all such as spam, regardless of their contents, on the theory that legitimate senders do not try to obscure their messages. In practice, this is usually a safe assumption, and the low priority position on the filters list means that encoded messages will not be classified as spam if the sender is in the address book or if the message matches the whitelist. However, if, for whatever reason, you expect to receive encoded messages that are not spam, you can uncheck “Encoded HTML mail is spam” in order to prevent SpamSieve from classifying them as spam on that basis…] Since the address wasn’t in the address book, the training becomes relevant. The normal SpamSieve behavior is that when you train a message as good, it adds the sender address to the whitelist. From then on, SpamSieve will predict all future messages from that address to be good.

In order for this to work, Use SpamSieve whitelist and Train SpamSieve whitelist must be selected in the preferences (which they are, in the default configuration). At this point, my best guess is that you received a spam message from your brother’s address, trained it as spam, and this disabled the whitelist rule. However, in order to know for sure I would need you to send me the log file

okay i’ve sent you the log file

Ordinarily, this alone wouldn’t cause SpamSieve to think the messages from your brother were spam, because it would learn to recognize them as good by their contents. However, as noted above, in the default configuration SpamSieve will classify messages with encoded HTML parts as spam, regardless of their contents. This won’t be a problem in the future if you uncheck “Encoded HTML mail is spam.”

Yeah. But I probably don’t want to do that and end up with more false positives than I can handle in a day

Thanks for sending the MessageRules.plist file. Please note that the installation instructions specifically recommend that you call your spam mailbox “Spam” rather than “SPAM”, although that’s unrelated to the specific problem discussed above.

I’ll make that change just to be ‘Buttoned up’…

But I hope we can find out what goes on with this training issue. Seems, as I said before, it’s a placebo. I’ve been very diligent in the past in training mail both good and bad as suggested – but my feeling is this is just a keyboard exercise rather than an effective means to reduce spam??? Let me know what you find out after looking at the log file and maybe that’ll shed light on the problem.

thanks so much for your quick response and interest in solving this problem.

MMD.

Michael_Tsai · June 11, 2008, 4:46pm

The reason to uncheck “Encoded HTML mail is spam” is to prevent false positives (good messages classified as spam). That’s why I recommend it for you. I don’t think it would increase the number of false negatives (uncaught spam messages) much, since your SpamSieve is already trained.

I guess I didn’t explain it very well above. What I was trying to get across is that if the message is encoded and you have “Encoded HTML mail is spam” checked, the training of SpamSieve to recognize messages by their content will be ignored. (The training of the whitelist and blocklist will still matter, though, because those are higher in the filters list.)

Looking at the log file that you sent, SpamSieve did not add your brother’s address to the whitelist when you trained the message as good. That likely means that the rule is already on the whitelist but that it’s disabled. I would need to see the old log files to be sure, but probably you had received some spams from his address, trained them as spam, and so SpamSieve disabled the whitelist rule (because it usually doesn’t make sense to whitelist an address that sends you spam; if you want to unconditionally accept messages from that address, it should be in your address book). I recommend opening the whitelist window and clicking the checkbox to make the rule for his address enabled again.

I think the behavior with the messages from your brother is an aberration. It was a perfect storm where:

The address wasn’t in your address book.
The whitelist rule was disabled because you had received spam from that address.
The message included an encoded HTML part, which is very rare for non-spam messages.
Due to (3), SpamSieve didn’t get to examine the contents of the message, so the corpus training was ignored.

Normally, a regular correspondent would already be in your address book, or there would be a whitelist rule (from previously receiving a message from that address), or SpamSieve would examine the message contents and find them non-spammy. So there are multiple safeguards to prevent good messages from ending up in the spam mailbox.

As far as reducing spam, I think you’ll see better results if you more promptly “Train as Spam” spam messages that get through, or else turn off auto-training. You could also move the aggressiveness slider back to the middle or a bit to the right—right now, it’s set to be a little more conservative than normal.