C-Command Software Forum

Not catching all "watches" spam

For a while now, my Spamsieve installation hasn’t been catching spam about watches very well. Previously, I was using Powermail, and next to none of them were caught. Now, after migrating to a new Mac this past week, I’ve reset the corpus, set up Spamsieve to work with Mail instead, and set to training.

Now, after about 280 messages trained at 71% spam (I’m working on getting some good messages to help fix that ratio to the 60-percentiles), Spamsieve seems to be getting a lot more of this stupid spam about watches, but still several get through. I have my spam-catching strategy set right in the middle, but thought I’d run it past people here for an opinion before I increase it toward “aggressive.”

Here are the entries from the log for the latest spam about watches:

Predicted: Good (27)
Subject: beautiful watches
From: zenith@earthlink.net
Identifier: 4F0kXYLPGElWIa6wc1L2fA==
Reason: P(spam)=0.130[0.500], bias=0.000, S:watches(0.999), rolex(0.999), rolex(0.999), run(0.002), replica(0.998), sell(0.002), replica(0.998), sell(0.002), run(0.002), cases(0.002), browse(0.002), cases(0.002), bands(0.998), bands(0.998), 1000(0.005)
Date: 2009-08-04 05:30:18 -0500

Trained: Good (Auto)
Subject: beautiful watches
Identifier: 4F0kXYLPGElWIa6wc1L2fA==
Actions: added rule <From (address) Is Equal to "zenith@earthlink.net"> to SpamSieve whitelist, added rule <From (name) Is Equal to “Chanel Watches”> to SpamSieve whitelist, added to Good corpus (65)
Date: 2009-08-04 05:30:18 -0500

Trained: Spam (Manual)
Subject: beautiful watches
Identifier: 4F0kXYLPGElWIa6wc1L2fA==
Actions: disabled rule <From (address) Is Equal to "zenith@earthlink.net"> in SpamSieve whitelist, disabled rule <From (name) Is Equal to “Chanel Watches”> in SpamSieve whitelist, added rule <From (address) Is Equal to "zenith@earthlink.net"> to SpamSieve blocklist, added rule <From (name) Is Equal to “Chanel Watches”> to SpamSieve blocklist, added to Spam corpus (163), removed from Good corpus (64)
Date: 2009-08-04 05:30:45 -0500

Mistake: False Negative
Subject: beautiful watches
Identifier: 4F0kXYLPGElWIa6wc1L2fA==
Classifier: Bayesian
Score: 27
Date: 2009-08-04 05:30:51 -0500

The word “watch” and forms of it in the corpus look like this:


Word	Spam	Good	Total	Prob.	Last Used
watch	14	6	20	0.482	8/4/09	
watchers	0	1	1	0.005	8/2/09	
powerwatchers.com	0	1	1	0.005	8/2/09	
powerwatch	0	1	1	0.005	8/2/09	
MI:forums.powerwatchers.com	0	1	1	0.005	8/2/09	
watchshop	2	0	2	0.998	8/2/09	
watches	17	0	17	1.000	8/4/09	
U:watch	1	0	1	0.995	8/2/09	
U:proudwatches	1	0	1	0.995	8/2/09	
U:logoswatches	1	0	1	0.995	8/2/09	
U:graandwatches	2	0	2	0.998	8/2/09	
S:watches	8	0	8	0.999	8/4/09	
S:watch?	2	0	2	0.998	8/2/09	
S:watch!	1	0	1	0.995	8/2/09	
S:watch	2	0	2	0.998	8/4/09	
rep1icawatches	2	0	2	0.998	8/2/09

Stats since 1/1/2008 are:

Filtered Mail
7,837 Good Messages
39,755 Spam Messages (84%)
68 Spam Messages Per Day

SpamSieve Accuracy
8 False Positives
611 False Negatives (99%)
98.7% Correct

Corpus
65 Good Messages
163 Spam Messages (71%)
14,071 Total Words

Rules
3,457 Blocklist Rules
10,495 Whitelist Rules

Showing Statistics Since
1/1/08 12:01 AM

Anything else needed?

It sounds like your re-training is off to a pretty good start. I wouldn’t worry about the details or change the aggressiveness until after the corpus has built up more and adjusted to the spams that are getting through.