SpamSieve not recognizing new spam

awy · March 5, 2008, 2:36pm

I am having a lot of spam problems lately, and spamsieve is suddenly having trouble keeping up.

I’ve noticed certain trends and types of spam that keep entering my mailbox and it seems like spamsieve really isn’t “learning” when I mark them. This was never a problem before. I think there must be something in them that is fooling the bayesian methods. Any ideas?

Michael_Tsai · March 5, 2008, 4:42pm

One reason it might appear not to be learning is if something in your mail program or in SpamSieve isn’t configured properly. Please see this page for how to send me additional information so that I can see what’s wrong.

awy · March 5, 2008, 4:58pm

I’ve enabled the ‘save false negatives to disk’ option for the future.

I’ve been using spamsieve for quite sometime and to my knowledge everything is setup correctly. It’s never been an issue previously.

Should I send my recent log file or wait until some false negatives are saved? <Edit - I emailed the log to the false negatives address>

Here’s an example from my log of something that is way out to lunch

Why it was marked as good is beyond me…

"

Predicted: Good (17)
Subject: The World Most Effective PenisLonger Pill Is Now Available! i1uxhe40u7bh5qpl
From: “Patrice Davidson” <patricedavidson_zg@minmail.net>
Identifier: QYoZFCGbB0dpzZt96gqCOA==
Reason: P(spam)=0.000[0.290], bias=0.000, F:Davidson(0.998), herbalist(0.002), S:Available!(0.002), bysex(0.002), all-night(0.002), thepenis(0.002), smalldick(0.002), S:Effective(0.995), S:World(0.096), throbbing(0.138), CT:iso-8859-2(0.791), S:Most(0.761), fuller(0.242), S:Now(0.258), endorsed(0.267)
Date: 2008-02-22 14:16:35 +0100

"

awy · March 5, 2008, 5:00pm

PS…

what’s this mean? does this mean the email somehow is adding things automatically to my whitelist? I certainly didn’t tell it that this was “good” email…

=====================================================================
Trained: Good (Auto)
Subject: nelapdni
Identifier: i5F1ZujULablJgI4LU81gQ==
Actions: added rule <From (address) Is Equal to “_nelli@autocombenelux.nl”> to SpamSieve whitelist, added rule <From (name) Is Equal to “CD Geshev”> to SpamSieve whitelist, added to Good corpus (2226)
Date: 2008-02-22 14:16:35 +0100

Michael_Tsai · March 5, 2008, 8:06pm

Thanks.

It looks like a previous message with similar content was auto-trained as good, and you didn’t correct SpamSieve by training it as spam (or, more likely, you did correct it, but not until after the second message had been classified).

When SpamSieve thinks that a message is good, it automatically adds the sender to the whitelist. In this case, it though the message contents were interesting, so it also added the message to the corpus. If it turns out that SpamSieve was wrong, and the message was spam, then when you train the message as spam, SpamSieve will undo all of this.

Looking at the log, one problem seems to be that there is a long delay (6 hours in the first case I saw) between when SpamSieve classified a spam message as good and when you corrected it. Due to the auto-training, for those 6 hours SpamSieve was working from incorrect information. This feeds back into more mistakes and more troublesome auto-training. If you cannot correct SpamSieve’s mistakes promptly, you should turn off auto-training.

Another problem is that until a few days ago you were using a version of SpamSieve from last April. Because of this and because SpamSieve’s corpus is rather old and large, it would probably help to reset the corpus and re-train SpamSieve with a smaller number of recent messages.

awy · March 6, 2008, 3:10am

Thanks for the info.

Any tips for retraining?

It will take a lot of hard work to be extremely selective of my “good” messages I think. I’m assuming it won’t feed into great accuracy if I select only good messages from a specific group of colleagues or something (?).

PS - the time delay you speak of,
I’m not sure what would cause that. The only thing I can think of is sometimes mail is recieving things when I’m in a meeting or otherwise preoccupied. What’s the best approach, just closing my mail program when i’m not actively watching it? And if so, is that my only option?

Michael_Tsai · March 6, 2008, 6:13am

I would try to make a smart mailbox of messages that aren’t in the Spam mailbox, sort it by date, and choose messages from there. Try to get a good mix of different types of mail, but don’t spend too much time on it.

The other option is to turn off “Auto-train with incoming mail” in SpamSieve’s preferences.

awy · March 6, 2008, 6:33am

Thanks a bunch for the useful tips, once again.

Working these kinks out will be helpful, but all in all, without spamsieve, my livelihood would be in serious trouble!