Why so much SPAM?

LCPGUY · March 20, 2008, 10:07pm

This program always worked so well, but now it is letting in so MUCH spam. What is going on here? And I have changed nothing. This used to be the best of the best, but lately, it sorta sucks! Sorry! And no, I haven’t changed any options, etc.

Michael_Tsai · March 21, 2008, 7:19am

You haven’t provided enough information for me to know. If you like, you can check the log by choosing Open Log from the Filter menu and then follow the instructions on this page. Or, send me a full report and I’ll take a look.

mtgentry · April 6, 2008, 9:35pm

I’m having the same issue. It seems like in the past two months much more spam is getting through. It’s about 4-5 spam emails a day which isnt too bad but it used to be zero. I recently changed my email settings on my account from pop to imap, could that be the cause?

Also the spam emails are usually very short. Like a sentence or two.

LCPGUY · April 6, 2008, 10:23pm

Michael, I appreciate your concern and advice. But I must say that I just want something that works like it used to. I do not want to have to be a paid customer and then have to participate in debugging.

Michael, nothing has changed here on my end. This problem is not unique to me as others have it too. Something has changed in SpamSieve, and not for the better.

I just got my third false negative from an email from jcpenny.com. I marked the first two as “train as good” yet now their 3rd email was still marked as spam. I also continue to get “porn” emails that SpamSieve is not catching anymore like it used to, even ones that contain words that I can’t repeat here.

I have changed nothing except for updating to your latest version.

Michael_Tsai · April 7, 2008, 6:27am

Perhaps. Please see the previous post.

Michael_Tsai · April 7, 2008, 7:22am

I don’t consider this debugging. It’s more like, something isn’t working on your Mac, so you either try to fix it yourself or you get a pro to look at it. I certainly appreciate that you want something that just works, and I would love to develop a product that never required technical support. But here we are. If you want help, I’m happy to provide it, but I really do need some more information.

Well, here’s how I see these issues:

It can be hard to know if something has changed on your end. People always
say nothing has changed, and often they’re right. But sometimes they forgot that they upgraded their OS or installed an application, or a haxie, or another Mail plug-in. (Seriously.) Sometimes they didn’t do anything, but an important OS or SpamSieve file got corrupted without their knowledge. I think you can only be sure if you restore from a full system backup made when everything was working properly. 1. It’s not at all clear what others have. With a few exceptions, people only post in the forums when there’s a problem. At any given time, somebody
will have a problem, but that in itself usually doesn’t point to the cause. Based on the e-mails I’ve received, I can tell you that the vast majority of users have not reported any change compared to the previous version of SpamSieve. A few have said that it’s much better, and a few have said that it’s much worse. 1. I’ve tried to look into every single case of the latter, but I didn’t find anything interesting. In most cases, something about the user’s mail or SpamSieve setup had changed. In some cases, they had not been correcting SpamSieve’s mistakes, and it took a while for this to present as reduced accuracy. A couple people were convinced that something had changed in SpamSieve, but when they reverted to the previous version they didn’t notice a change. We also compared they way the current and previous versions of SpamSieve analyzed particular spam messages, and there was no difference. SpamSieve 2.6.6 has now been out for 2 1/2 months. Usually if there’s a problem, I get lots of e-mail about it within the first day or two. Here, it’s been nothing like that, so I’m pretty confident that there have not been any accuracy regressions. Of course, there’s always the possibility of finding a bug, and I’m always interested in looking into specific cases, but I want to stress that this is a specific case. There’s no epidemic.
It’s important to remember that nothing is static. The spam messages that you receive are changing. The good messages are changing, too. SpamSieve’s corpus (the collection of messages that it’s been trained with), blocklist, and whitelist are also changing. The mail program setup and address book may also be changing. And all these vary among users. This is a strength of SpamSieve, in that it can be more accurate by adapting to your very own mail. However, it also means that it won’t behave in exactly the same way on any two Macs. Thus, if there’s a problem, it’s especially important to send in a report. Otherwise, I know that it’s working on most Macs, but not on this particular one, and I haven’t the slightest idea what’s different between them.

A good message that SpamSieve classified as spam is called a false positive. What does SpamSieve’s log say about these jcpenny.com messages? Does jcpenny.com appear in any rules on SpamSieve’s whitelist? Are the rules enabled (checked)?

This aspect of SpamSieve’s filtering is incredibly simple. If you train a message as good, it will add the address to the whitelist, and future messages from that sender will always be classified as good. If this is not happening, the most likely possibilities are:

The mail program is setup incorrectly such that SpamSieve is not being asked to examine these particular messages. Some other rule or filter is moving the messages to the Spam mailbox.
SpamSieve’s whitelist was turned off.
One of the messages was trained as spam, causing the whitelist rule to be disabled.

Does SpamSieve’s log say that these messages were “Predicted: Good”?

LCPGUY · April 7, 2008, 9:54am

I found the problem Michael. It was my fault. Somehow the option in SpamSieve “Check for message in corpus” was unchecked. I’m sure things will work much better now that I have it checked again.

By the way, “Use Habeas Safelist” is checked. Is that good, or does it do anything if I don’t subscribe to that service?

Thanks for all your help.

John

Michael_Tsai · April 7, 2008, 10:18am

Although I do recommend that this option be checked, I don’t think that not checking it would cause the problems that you reported.

It can do some good (prevent false positives) whether or not you subscribe to their service. However, the effect is not that large. In order for it to make a difference, you need to receive a good message that SpamSieve thinks is spam (rare), and the sender of that message must use the Habeas service (also rare). There’s no harm in using the safelist, though, so I recommend leaving it checked.

LCPGUY · April 7, 2008, 10:27am

Well, I noticed in the log file that jcpenny was constantly being identified as Spam. I then marked as “Train as good” and it did, and the log file said it was trained as good. Then the next message I got from jcpenny was once again flagged as spam. This cycle just kept repeating itself over-and-over.

I guess I’ll just have to wait and see, unless you have any other ideas.

John

Michael_Tsai · April 7, 2008, 11:08am

At the risk of repeating myself, please send me the log file. And please answer the questions I asked about the whitelist.

Michael_Tsai · April 7, 2008, 11:53am

Thanks for sending the log and the whitelist screenshot.

The relevant whitelist rule for JCPenney (created on 2/5) is disabled, which can only happen if you disable it manually or if you train as spam a message that was sent from that address (which would also make SpamSieve tend to think that the contents are spammy). If you want future messages from that address to always be classified as good, you should enable the rule. Otherwise, SpamSieve will eventually learn to recognize them, but because they’re marketing-type messages, they’re similar to spam, so it may need to see several of them first.

The overall accuracy is about 98.6%, which seems OK, although not quite as high as I’d like. The log goes back to 4/2 and shows no spam messages let through in that time. So I would need to see one of the archived log files (in the same folder) to know why spam messages were getting through.

Part of the reason that you’re seeing good messages classified as spam is that you have SpamSieve set to be extra aggressive.

One false positive (“famine in the land”) was because the message was sent from an address on the blocklist, probably meaning that you’d trained a message from that sender as spam.

Another false positive (“08-14 Digest”) was a mailing list message containing a PDF file. The PDF contained some words that had previously appeared only in PDFs attached to spam messages. This type of problem can be avoided if you use a representative mix of your good and spam messages in the initial training.

mtgentry · April 13, 2008, 10:26pm

Thanks for the email support Micheal, resetting the corpus did the trick

joeflyer59 · April 27, 2008, 1:28pm

New Breed of Spam?
My filter has been letting in a horrendous amount of spam over the last few months as well. I’ve lurked around the forums here a bit and learned alot. I’ve reset the corpus (because I had way too many messages for it to train with) and while that helped a bit, it didn’t help for the most part.

One thing I’ve noticed is that there is mainly ONE type of spam message that gets through - it catches everything else. It comes with a ton of different types of content and from a ton of different senders - but it’s basically one line of text and a url link. no images, nothing major. Obviously - there must be something under the “hood” of these emails which does not allow them to be trained properly and stumps the filter.

anyone else?

joey b

Michael_Tsai · April 27, 2008, 2:07pm

What does SpamSieve’s log say about those?

joeflyer59 · April 27, 2008, 6:21pm

log entries
Here’s the log for 3 messages.

Michael_Tsai · April 28, 2008, 6:47am

Please e-mail me the whole log file and the false negative files.

Michael_Tsai · April 30, 2008, 6:37am

Thanks for sending the whole log. It doesn’t look like you’ve trained SpamSieve with any spam messages since April 23. So, whenever a spam message gets through, SpamSieve thinks it was correct that the message was good, and this leads to more getting through.

joeflyer59 · April 30, 2008, 6:49am

Thanks for checking it out Michael!

April 23 wasn’t that long ago. I’ve trained a ton of these type of messages and kinda gave up hope since then. I don’t want my corpus to get so huge that it becomes totally ineffective again. But this spam has been slipping through for awhile. ANy other clues?

joey b

Michael_Tsai · April 30, 2008, 7:17am

It’s essential that you train as spam every spam message that gets through. Otherwise, SpamSieve will think that they’re good. Since you haven’t been doing that recently, you’ll need to reset the corpus and start over. For best results, re-train it with some recent messages that are representative of the types you are currently receiving.

I did see some interesting stuff in the log file, and I’ll be making improvements in the next version of SpamSieve.