Synchronizing the training data between two machines


I have a spam filtering drone setup, which works great. Unfortunately, from time to time, the network to it goes down and I cannot go and fix it for a couple days. When this happens I would like to use my laptop to access my messages, and use the same training data that I’ve been using on the drone. So I’m wondering the following to keep it to date:

  • What file contains the training data? Is it only Corpus.corpus?
  • Is this file only modified when trained with good or bad messages?
  • What about the Rules files?
  • Can I copy this/these files over when Spamsieve is running, or do I need to quit it first?
  • Do the preferences contain any training data?

Corpus.corpus and Rules.

No, it’s potentially modified for every message that’s filtered.

This is also potentially modified for every message.

You need to quit it before copying or replacing those files.


I have one quick follow-up question: is there any question of consistency with other files? If I use the same corpus and rules on two computers and they diverge, can I just copy both files from one computer to another?

Such as? Do you mean the log or statistics?


Yes, the log or the statistics, for instance. I want to make sure I don’t break everything by copying only these two files. But your second answer seems to confirm this.

The log and statistics will be inconsistent with the corpus and rules, but this won’t break anything since the former are only for display purposes.

Is there a way that I can use a cloud service (like DropBox, MobileMe or Amazon) to store the spam sieve files so that I can use this magical program at home and at work? Assuming i remember to turn mail program off in the other location before using it?


The way to do this would be to manually copy the files to and from the cloud (while neither SpamSieve nor the mail program is running). Once SpamSieve is at a good accuracy level, you can use it independently at home and at work (assuming only one mail program is running at a time); there’s probably no need to keep synchronizing it.

Alos, thanks for answering the unasked question: Is it ok to do this license wise. I had meant to add that to the Q.


Here’s the full story about licenses.

