Training Question

Homer712 · March 8, 2024, 4:21pm

Trying to just get a clarification on something I’ve read in the user manual. Where the manual explains how to do an initial training, it uses an example of using up to 1000 messages, and that the messages should be approximately 65% spam. After reading that section (and having run SpamSieve for about a week) I took a look at my statistics and I had a spam percentage of something under 10%. I don’t get many spam messages, but want to filter out all that I do get. Thinking that the percentage was way off, I reset the Corpus.

Now, after thinking about it a bit more, I’m thinking that the percentages in the Corpus and numbers in Statistics really have nothing to do with the initial training example (by using 1000 messages, of which 65% should be spam) and over time the percentages of good messages versus spam messages will change quite a bit over time. So, by reseting the Corpus, I really accomplished nothing other than setting my SpamSieve training back one week.

Michael_Tsai · March 8, 2024, 5:06pm

How many good and spam messages did you use for the initial training? How many of each did it show in the corpus before the reset?

They are related, and the percentage normally won’t change much over time (unless it starts out very unbalanced) because SpamSieve will try to maintain the proper ratio for you.

Probably, unless you started out with a very large and unbalanced initial training.

Homer712 · March 8, 2024, 7:32pm

Michael,

When your email got to me, I had four other new email in my inbox. I read them all and then checked both Statistics and Corpus, and this was the first email check since the Corpus reset this morning. I’ve attached screen shots of both. I did not use the SpamSieve menu icon to mark any of the emails, I just clicked and read them. So, Statistics shows 6 good emails, and Corpus shows 5 good emails, yet I didn’t train SpamSieve with any of them. So it looks to me that somehow, just reading the email gets it registered with SpamSieve as good.

What am I missing?

Henry

Michael_Tsai · March 8, 2024, 9:11pm

This is probably due to the Auto-train as needed feature. This is how SpamSieve learns new messages if you haven’t trained it with many yet and also how it tries to keep the corpus ratio balanced if you train it with too many of one type.

Homer712 · March 8, 2024, 10:06pm

Michael,

So, I found the Auto-train as needed in the settings. Seeing how I really am not being overwhelmed by spam, it should be safe to turn this setting off, and just have SpamSieve rely on my selecting either good or spam from its menu. And, I think I can keep the ratio pretty close to the desired 65% to 70% spam to good.

Thanks for the explanations, I think I’m starting to better understand (and becoming more comfortable with) how SpamSieve operates.

Henry

Homer712 · March 10, 2024, 4:21pm

So, after doing some more reading of the user manual (more specifically the section on multiple Macs) I decided to go with the drone setup. I uninstalled SpamSieve on my 2014 Mac mini, and followed the instructions for setting up the MacBook Pro as the drone, as well as turning off the Auto-train as needed setting. Created the TrainGood and TrainSpam mailboxes in the Yahoo account and it seems to all be working correctly. It’s the weekend so the spammers are quiet, but I have no reason to think that things won’t work as expected once the workweek begins.