spamassassin-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Theo Van Dinter <felic...@kluge.net>
Subject Re: [SURBL-Discuss] checking plain domains in message bodies against SURBLs reportedly effective
Date Sat, 04 Sep 2004 17:36:53 GMT
On Sat, Sep 04, 2004 at 10:45:44AM -0600, Ryan Thompson wrote:
> Yep. Good idea, overall. There are a few gotchas:
> 
> TLD extensions sometimes map file extensions. We might have to whitelist
> command.com, and the entire country of Poland. :-)
> 
> Since the domain is in plain text and doesn't contain a protocol or
> subdomain (i.e., 'www'), I haven't yet seen a mail client that will
> display it as a clickable URL.

This is generally the tact we're taking in SpamAssassin -- if a general
MUA doesn't display it as a link, then we don't consider it an URL.

Another issue for the generic domains thing is performance -- lots of
messages have lots of things like could potentially look like a domain,
and querying for them all adds a bit of a load on the client and the
server.

For instance:  /\b([a-zA-Z0-9_.-]{1,256}\.[a-zA-Z]{2,6})\b/

in theory (I haven't tested it), will grab anything that looks like a
generic domain name in text.  If you check that list against a list of
valid TLDs, you'd probably end up with a decent list, but you'd hit the top
issue quoted above where "Go take a look at command.com" isn't clear if it's
an URL or a filename.

-- 
Randomly Generated Tagline:
"Brevity is the soul of lingerie." - Dorothy Parker

Mime
View raw message