Stock spam immune to SpamSieve?

system · August 31, 2006, 6:39am

A lot of the spam I receive consists of a jpeg touting some junk stock and a bunch of semirandom text, and SpamSieve almost never recognizes these as spam, no matter how many times I manually flag them as such. What’s the magic trick that gets these things past SpamSieve (and Tuffmail’s spam filters, for that matter)? It’s like they’re immune to spam filtering.

Michael_Tsai · August 31, 2006, 6:49am

In my experience, SpamSieve has no trouble catching that kind of spam. Please report them so that I can see if there’s a setup problem on your end or if I need to make any adjustments to SpamSieve.

system · September 1, 2006, 8:11am

Will do, thanks.

hedgman · October 7, 2006, 4:32pm

Same problem
I am getting the same kind of spam. Daily. I have checked the appropriate preference to save them and will report them to you per your instructions after I have a couple. It is the only stuff that slips through. I love SS.
hedgman

hedgman · October 25, 2006, 12:07pm

This works great on my system.
A suggestion from Michael Tsai: Try creating a
blocklist rule in SpamSieve:

Body Matches Regex <body bgcolor="#ffffff" text="#000000">\s<img

pjmolite · October 25, 2006, 7:39pm

I’m Getting Them Too!
I’m getting the same penny stock spam sometimes twice a day. It’s a .GIF file that’s attached. No one sends me those so I tried to add the rule “any attachment ends with” in the blocklist but it will not work. I’m using Apple Mail, so I’m trying a second rule to block (it seems to work but I have to wait and see). My Network Solutions web mail doesn’t block the attachment either so there must be more to this than I know.

I’ll keep you posted on this. I’m new here, using SpamSieve for about six months. At least I know I’m not alone!

Michael_Tsai · October 25, 2006, 7:56pm

It sounds like that should have worked. Please send me some information so that I can help you further.

pjmolite · October 27, 2006, 12:04am

I Think I Resolved It.
I think I resolved my problem. I re-trained SpamSieve from scratch (deleting everything), but this time applied Apple Mail rules in groups of 500 messages. Possibly, SpamSieve was over-trained initially. Also, the SpanSieve rule was not listed correctly. I read the manual more carefully.

Thanks.

Michael_Tsai · October 30, 2006, 5:55pm

SpamSieve 2.5 should be much better at catching these spams.

garybx · November 2, 2006, 1:49pm

I just read a discussion on macoshints (http://www.macosxhints.com/article.php?story=20061031221923601) about how to handle these new-fangled spam messages where they use an image to contain the “text” of the advertisement. Basically, they recommend using a rule which looks for attached GIF files which are NOT coming from a person in my address book and has a Content-Type with “multipart/related”. That seems to be a good rule, but I notice that the new SpamSieve 2.5 mentions that it is improved in recognizing image messages (which I assume are the same as we’re talking about here).

The folks at macoshints were concerned that training a spam program such as yours to reject these would be counterproductive since they frequently also include random text with “good” words.

Here’s the relevant part of the referenced discussion:

------------ start of quoted message --------------
This is less a hint about improving accuracy than one about not deteriorating the accuracy of your junk mail filter.

Most email spam filters classify spam based on word frequency. When you train the filter, you give the filter a list of bad words. If a particular bad word comes up frequently it increases the likelihood that that email is spam. Then when an email comes in with enough bad words it is classified as spam.

The .gif emails that are currently going around do not get filtered well by spam filters. These emails contain two parts: the first part, the .gif, contains the text that the filter would normally learn to trigger off. Since this text is in the image, the filter can’t see it. Instead the filter sees the second part of the email: a list of random phrases and words. The filter picks up these words and calculates a spam score from them.

The best thing to do with these spams is simply delete them and not try to train your filter with them. If you do have your filter learn the words in these messages, it will only be learning common words, which will skew its results. Use a rule to highlight these emails or to move them to your spam folder, as explained in this hint; it will maintain the integrity of your spam filter.
------------ end of quoted message --------------

Do you have an opinion about this - ie, whether your spam filter’s accuracy is degraded by flagging this particular type of email as spam (due to the good words included)?

Michael_Tsai · November 2, 2006, 4:51pm

I don’t recommend making a rule like MacOSXHints says, since non-spam messages with “multipart/related” are common. SpamSieve 2.5 uses a variety of new techniques to recognize image spams, but in order for it to do so, you must train it with them. In fact, I recommend that you always train SpamSieve with spam messages that it missed. Omitting messages because you think they might be counterproductive will seriously decrease SpamSieve’s accuracy, because it’s like telling it that you do want those messages in your inbox.

garybx · November 2, 2006, 7:31pm

Thanks for the advice. I will continue to let SpamSieve do its job. In fact, since it is currently telling me that its corpus is larger than necessary, I am going to reset it and retrain using the 600-700 spam messages I’ve received this week!