RegEx for Blocklisting From header with ill-formed email addresses

Feste · June 25, 2013, 5:59pm

I wrote a RegEx that is supposed to identify emails which have illegal whitespace characters inside the from address. I get lots of spam that populates this header field with things like, “support@stu pids pamme rs.com” The “spaces” inserted are done so in order to prevent spam filter from being able to harvest the email address and add them to blocklists. For example Roadrunner allows me to add email address to a blocklist, but when I click on these kind of SPAM, it hiccups and tells me that there are no valid emails to blocklist. Here is the RegEx: \b[A-Za-z0-9._%±]+@[A-Za-z0-9]+\s]+[A-Za-z0-9]+.[A-Z]{2,6}\b
I tested this on a free online RegEx engine and it worked like a charm, but when I try creating a regex rule for “From (address)” in SpamSieve, it doesn’t successfully filter the spam that matches this expression.

Next, I tried to run this RegEx code through the “Import-Blocklist-Regex-Rules.scpt” that I found in the forums. It generated the following rule:
(?-i)\b[A-Za-z0-9._%±]+@[A-Za-z0-9]+\s]+[A-Za-z0-9]+.[A-Z]{2,6}\b
Almost identical, but which includes a “(?-i)” before the word break code in my regex expression. I added this version to my blocklist and it still won’t blocklist SPAM with matching from email addresses.

Can anyone here tell me the SpamSieve regex formulation that will do what I’m trying to do? Alternatively, does anybody have a better “ill-formed” address RegEx filter that they’ve verified works with SpamSieve?

Thanks much!
Feste

Michael_Tsai · June 25, 2013, 6:21pm

It doesn’t look to me like this regex should match your example of “support@stu pids pamme rs.com” because the regex only allows for one run of inserted spaces in the domain. So it would only match something like “support@stu pids.com”.

You could modify it like this:

\b[A-Za-z0-9._%+-]+\@([A-Za-z0-9]+\s]+)+[A-Za-z0-9]+\.[A-Z]{2,6}\b

This would match an e-mail address that contains one or more runs of spaces.

However, this is much more complicated than necessary. With SpamSieve’s blocklist, the regex doesn’t have to match the entire subject string (e-mail address), only a portion of it. So all you have to do is look for the invalid part, i.e. whitespace. Thus, I think it would work to simply make your regex:

\s

Feste · June 25, 2013, 7:35pm

Michael_Tsai:

It doesn’t look to me like this regex should match your example of “support@stu pids pamme rs.com” because the regex only allows for one run of inserted spaces in the domain. So it would only match something like “support@stu pids.com”.

You could modify it like this:
\b[A-Za-z0-9._%+-]+\@([A-Za-z0-9]+\s]+)+[A-Za-z0-9]+\.[A-Z]{2,6}\b
This would match an e-mail address that contains one or more runs of spaces.

However, this is much more complicated than necessary. With SpamSieve’s blocklist, the regex doesn’t have to match the entire subject string (e-mail address), only a portion of it. So all you have to do is look for the invalid part, i.e. whitespace. Thus, I think it would work to simply make your regex:
\s

Thanks for the quick reply. Unfortunately, neither of your reformulations work either. This is an actual “From” field that came in one of the SPAMs I want to block, “1-2-3 Income Academy <support@zoo mpat hregi strat ions.c om>”
I tried setting a regex block rule using “All addresses” and then tried “From Address” with the recommended regex strings. None of these permutations worked against this actual SPAM “From” email address.
After building the rule, I make sure the “active” checkbox is checked, I have the unblocked message in my inbox, right click on it and select “apply rules” on the contextual menu. After a few seconds, I get the “beep” telling me that SpamSieve is done, but the message remains in the inbox instead of getting moved to the Spam folder. I’m really surprised that SpamSieve didn’t come with a preset block rule for any message that had ill-formed address fields.
Thanks for trying to help me anyway…
Regards,
Feste

Michael_Tsai · June 26, 2013, 7:57am

I tried a test message with that From, and it worked on my Mac. If you still can’t get it to work, please save your message as “Raw Message Source” format and post a ZIP archive of it so that I can test it here.

If you look in the log, you can see what SpamSieve thought the From address was, and why it thought the message was not spam. Maybe it matched your whitelist or something. Or maybe the SpamSieve rule wasn’t actually applied to it.

I haven’t found it to be necessary. SpamSieve should be able to catch your spam automatically without this rule. If it’s not doing so, please send in a report. Also, FWIW, I just searched the 247,171 spam messages that I’ve received this year, and none of them had spaces in the From address.

Feste · June 26, 2013, 3:50pm

Michael said: “If you look in the log, you can see what SpamSieve thought the From address was, and why it thought the message was not spam. Maybe it matched your whitelist or something. Or maybe the SpamSieve rule wasn’t actually applied to it.”

Michael - Thanks. The reason that I started using SpamSieve was because I heard it supported user generated regex rules. I’ve been inundated with SPAM with illegal addresses, but could find no filtering process that screened on that basis. Your coaching helped me get it to work. Looking at the logs, I discovered that the a combination of Corpus screens and whiterules were trumping the regex rule that you gave me. I reset the Corpus and deleted a bunch of whitelisting rules that I had foolishly created by telling spamsieve that the subject spam was “good” so that I get it back into my inbox to test the new rule. I’m not used to using whiterules, and it didn’t occur to me that I had caused SpamSieve to whitelist the subject messages.

After cleaning up my whitelist and reseting the corpus, “\s” worked like a charm; but only because SpamSieve had the foresight to include a “From Address” rule case. The more complicated rule that I generated assumed that the filter would have to scan the entire “From” record and not react to white spaces in the identifying information (typical names) that often precedes the actual email address. I underestimated the power of SpamSieve.

This product, with your coaching, has exceeded my expectations! Thanks again- Feste.