spamassassin-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Adam Katz <antis...@khopis.com>
Subject FSL_RU_URL Re: whitelist
Date Thu, 23 Jun 2011 18:16:45 GMT
On 06/22/2011 05:42 PM, Noel Butler wrote:
> Resurrecting an old thread but....
> Lately I see a lot of false hits on   FSL_RU_URL
> The only place in the email where .ru is, is in envelope-from ,  from,
> and the received headers, this is supposed to be
> from   72_active.cf:uri    FSL_RU_URL      /[^\/]+\.ru(?:$|\/|\?)/i
> 
> (those also on the c-nsp list may also be seeing the same?)
> This only started recently.

Full rule, originating from rulesrc/sandbox/maddoc/99_fsl_testing.cf

uri      FSL_RU_URL      /[^\/]+\.ru(?:$|\/|\?)/i
tflags   FSL_RU_URL      nopublish
score    FSL_RU_URL      0.01

I see several problems here.

Chiefly, it's marked "nopublish" but is in some(?) copies of
72_active.cf (not trunk, and the rule is completely absent from the
current 3.3 and 3.2 svn branches) ... is this out of sync?  IIRC, we
fixed this problem a while ago, so perhaps Noel's system isn't properly
using sa-update, it hasn't propagated yet, or he's doing something fishy.

Scoring a rule in a sandbox is good for documentation purposes
(especially if mirroring a third-party sa-update channel), but has no
bearing on the resulting score published through the GA.  Therefore,
scoring something 0.01 as a safety net does nothing.  A rule with tflags
nopublish and score 0.01 is much safer (given our current bugs) if named
with the T_ prefix.  (Other devs, please correct me if I'm wrong here;
I'm not fully sure about the un-sandboxing mechanism.)

A safer and cleaner regex for that rule would be:

uri      FSL_RU_URL      m'^http://[^/:#?]+\.ru\b(?:$|[/:#?])'i

This prevents FPs like http://ham.example.com/how.ru and FNs like
http://spam.example.ru:8080/gotcha and uses a regex character class
(square brackets) rather than branches (pipes) for efficiency and
legibility purposes.  The \b also provides a (very minor) efficiency
boost.  It also excludes https links as they're more likely to be ham.
I moved to m'' to avoid the need to escape slashes.

Even still, this is an awful rule, especially without leading
underscores (e.g. __FSL_RU_URL) to be used in a meta rule that hunts a
particular spam pattern.


As Ned answered, we need more information.  Specifically, tell us about
your setup; what version (and package) of SpamAssassin are you using,
tell us about your sa-update configuration, any hacks, etc.

Since FSL_RU_URL is so broad that it will match any link to any .ru
domain, we don't really need to see an example (unless you're confident
you have an example which lacks an actual .ru link ... this is a bug if
that's triggering on one of the headers you're mentioning).


Mime
View raw message