Too much spam to train?

djhomeless · January 1, 2007, 5:36am

Hi All,
I’m trialing out SpamSieve on an account and have a bit of a problem.

The account I’m training SS for is a particularly nasty one. Its been around for ages and roughly 95% of the inbound email I get is Spam (it’s been around for over a decade).

In any case, I’m concerned that every time I highlight messages as Spam, I get the warning from SS that my corpus contains too many spam messages. The readme says that my corpus should contain at least 65% spam messages. However, outside of just not training messages to even the balance, I’m not sure what else to do??

Thanks in advance,

Geoffrey

Michael_Tsai · January 1, 2007, 7:52am

I doubt that this will be a problem. The recommended ratio is for the initial training. You should use a combination of stored messages (probably not all of your spam, since you have so much) to get about 65% spam. After that, SpamSieve will auto-train itself, using the incoming messages, and it will try to maintain the proper ratio automatically. At this point, you only need to manually train messages that SpamSieve put in the wrong mailbox. This probably won’t be enough messages to throw off the ratio, but if it is, that’s OK.