I have a spammer who has created a style that seems to pass the Bayesian classifier. I flag it as spam repeatedly and I’m guessing that he changes something just enough to make it look different. Often a different email address, name, or subject. Is there something I can do to make it catch these? (I just noticed that the server junk filter calls it spam)
Below is a copy and paste of the log entry. (1 of 4 for this specific message)
Summary: SpamSieve’s Bayesian classifier predicted this message to be Good based on a statistical analysis of its content.
Score: 4 (0 is least spammy; 100 is most spammy)
Words: ^a-style-padding15px40px(0.001), ^a-style-padding30px20px(0.001), CT:vault(0.002), H:X-Process-Key(0.002), V:Charleston(0.002), V:picnic(0.002), ^a-style-fontsize18pxfontweightboldcolorfffffftex(0.002), ramen(0.002), V:Mount(0.005), V:blanket(0.005), V:broth(0.005), tonkotsu(0.005), care!(0.071), U:events(0.074), A:Meat(0.110)
Accuracy: Correct (if this message is not Good, you should train it as Spam)
Help: Correct All the Mistakes
Superseded Prediction: SpamSieve classified this message again later.
Date Logged: Today at 10:22:37 AM
Subject: (HP) Membership Update: Thanks for stopping by Costco recently
From: Perk C0STC0 SpeciaI <perkcstcspeciai@freshtodays.com>
Date Sent: Today at 8:55:47 AM
Date Received: Today at 8:56:06 AM
To: cheryl@bakerstreetinn.com
Size: 16 KB (6 KB compressed)
Identifier: YguJ3c3QLOLXT/bxMFsSWg==
Server Filter: The server junk filter classified this message as spam.
Origin: Unknown
Contacts: 297
Excluded Contacts: 1
Good Messages: 14,418
Spam Messages: 14,603
Bias: 0.000
P(spam): 0.000[0.000]
Tokenizing Time: 0.011s
Processing Time: 0.282s
SpamSieve: 3.3 at /Applications/SpamSieve.app
Device: macOS 15.7.4 (24G517) on Cheryl’s MacBook Pro (MacBookPro15,1)
User: cherylmhockaday (Cheryl M Hockaday)
Language: English
If you look at the Subject you can see that it begins with “(HP).” Because my server (Host Papa) has such a lousy spam filter, I have set it to place that prefix in the subject line of any that is calls spam and not move them out of the Inbox so I can try to clear any false positives.
This could indicate that there are other spam messages containing these words that were incorrectly trained as good. I recommend looking for them as described here.
As I look into it, I find that the third one in the list is in 2 false negative spam messages and also in a good message. Should I train those two as spam based on that with a good message in the mix?
Are you searching the Good Messages section of the Corpus window? Any spam messages that are listed as good should be trained as spam. The good ones should not be.
The prefixes have to do with where in the message the words were found. F means the From header, CT means the Content-Type header, and V means body text that may be invisible.