Run SpamSieve Directly on a Mail Server

Sebby · December 15, 2023, 6:21pm

I am giving serious thought to moving my mail server onto a Linux VM running on my Mac Mini, and it occurred to me that it would be very awesome if the spam filtering/training could be done in a UI that’s accessible directly on that machine (or remotely with screen sharing), and using my existing corpus and also with my iCloud account. Obviously SpamSieve comes to mind, with the mail to be classified fed to it, and any training handled by a running copy of Mail on the server machine, as you do.

Now the question is, what interfaces are accessible to classify and train? Is it just the AppleScript interface? Is there a command tool, or socket, that is reachable for this purpose by plug-ins or whatever? If it were possible, I’d write a small helper that ran in the Linux VM and communicated to the SpamSieve running on the Mac, perhaps over a network socket, or perhaps even using ssh to execute a command remotely. I guess the first order of business is to establish what options there are. The goal is to avoid having to run a redundant spam filter inside the guest and take advantage of my SpamSieve corpus. Just using Mail.app is acceptable for training, but because it will not be immediate, I’d like a faster approach for classifying incoming messages so I don’t have to be notified about spam.

Any ideas?

Michael_Tsai · December 15, 2023, 6:52pm

SpamSieve does have an undocumented socket that it uses to receive commands from Apple Mail (pre-Sonoma). I’m not sure that you can reach this from your VM, though, since it uses INADDR_LOOPBACK.

You should certainly be able to use ssh to invoke AppleScript commands remotely.

Sebby · December 15, 2023, 7:12pm

Thanks.

It’s right that the socket is bound to loopback by default, though perhaps if you documented the protocol you could consider an option to open that up to all clients (with appropriate caveats). Even so, a simple proxy on the Mac can easily make this possible without any change to SpamSieve. I would certainly be interested in documentation for the protocol, since I could then write a client in the guest that could talk to it via the proxy (or, indeed, an SSH port forwarding). Sockets are certainly preferable to spawning new processes for every delivery.

Michael_Tsai · December 15, 2023, 11:11pm

I don’t have the bandwidth at the moment to figure out whether/how this should be opened up. But what I can do now is share with you how it works currently, with the understanding that this could change (though I don’t intend for it to anytime soon).

As I said, it’s listening on loopback (127.0.0.1). The port number changes at each launch of SpamSieve and is written to the file ~/Library/Mail/SpamSieve/ServerInfo.plist. SpamSieve will only do this if it has Full Disk Access, and you will need FDA to read it.

You can POST the raw message data that you want SpamSieve to classify to http://127.0.0.1:port/score. This will return a property list that either has a score (50 and higher means spam) or an error.

You can use the headers SpamSieve-Application-Name and SpamSieve-Application-Version so that the log shows which app/tool the message came from.

Sebby · December 15, 2023, 11:32pm

Awesome, thanks! I’ll give this a go when I next get the chance, probably after the server is up and receiving mail.