The corpus contains the messages that you’ve trained SpamSieve with (using the “Train as Spam” and “Train as Good” commands), as well as the messages that SpamSieve has trained itself with using auto-training. You can see the corpus numbers in the Statistics window. There you will also see some numbers for “Filtered Mail.” These are the incoming messages that SpamSieve has processed (decided whether they were good or spam), but it was probably only trained with some of them. I think this difference may be the key to your misunderstanding.
So, for the initial training, you are supposed to use 1,000 or fewer messages, 65% of them spam. After that, you don’t need to worry about the corpus size or ratio. You only need to use the training commands if SpamSieve puts a good message in the spam folder or a spam message in your inbox.