spamassassin-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Kris Deugau <kdeu...@vianet.ca>
Subject Re: Regex help (targetting very long HTML comments)
Date Mon, 09 Apr 2012 16:10:31 GMT
Adam Katz wrote:
> % grep html_text_match..comment 20_html_tests.cf

I hadn't known about that function until I saw Henrik's replies last 
week, so it would have been hard to search for it.

> Any more that 512 chars isn't going to be helpful but will end up being
> computationally expensive (I've played with this idea).  Also, I'd say
> this is more of a ham indicator than a spam indicator.

*shrug*  I happen to be getting a wave of ~400K spams that consist of 
about 1K of real HTML tags, loading the spam content via image from a 
remote server, with the remainder of that 400K message consisting of 
maybe four *very* long HTML comments (50K+) with nothing but gibberish 
(groups of ~4-8 words, separated by /, ;, # and occasionally some other 
symbol).

I've also seen gobs of mail with ~5K of CSS in an HTML comment - mostly 
from Outlook.  *eyeroll*

These are most of what's still getting through to *my* inbox, but with 
~50K users I'd assume they're hitting other people as well. 
Unfortunately, as an ISP sysadmin, my ability to get useful, timely 
feedback from a high proportion of the userbase is...   limited.

-kgd

Mime
View raw message