spamassassin-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Kevin A. McGrail" <>
Subject Re: Help with a regex to catch spam with gibberish html tags
Date Thu, 30 Jan 2014 19:03:09 GMT
On 1/30/2014 12:39 PM, Amir Caspi wrote:
> On Jan 30, 2014, at 10:28 AM, Kevin A. McGrail < 
> <>> wrote:
>> If you want to share the complete rule, I can throw it into my 
>> sandbox and see what masscheck thinks as well.
> The complete rule would be something like this, assuming Andy 
> implemented it as I wrote it:
> rawbody HTML_NONSENSE_TAGS/(?:<[A-Za-z0-9]{4,}>\s*){10,}/
> describe HTML_NONSENSE_TAGSMany consecutive multi-letter HTML tags, 
> likely nonsense/spam
> Score to be adjusted as needed, of course.
> If one wants to be even more explicit, one could require that the tags 
> be prefaced with a <style> tag, although that should, hopefully, get 
> picked up by John Hardin's modifications to STYLE_GIBBERISH sometime 
> in the near future.
> Cheers.
> --- Amir
Added to the sandbox.  In a day or three, we should be able to check the 
ruleQA and see what it looks like on the masscheck corpora.

  svn commit -m 'Adding html tag gibberish tag rule for testing from 
Amir Caspi on the mailing list'
Adding         rulesrc/sandbox/kmcgrail/
Transmitting file data .
Committed revision 1562916.

Rule is called AC_HTML_NONSENSE_TAGS and you can then look at it on

The S/O is the big thing to look at


View raw message