spamassassin-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Theo Van Dinter <felic...@apache.org>
Subject Re: URIBL_BLACK matching on messages with no URLs in them...
Date Mon, 02 Jul 2007 21:16:23 GMT
On Mon, Jul 02, 2007 at 01:28:27PM -0700, Jo Rhett wrote:
> Both of these assume I know every person who needs to e-mail me, and  
> everything they will send me.  Theo, you're active in enough open  
> source projects to know better.

Well, you just said you were receiving a large amount of "system" type mails,
which for me would all be from my own/well defined set of systems.

> Well then we need to alter the code.  While bareword domain matching  
> might make sense, it doesn't make sense for /a/valid/system/path/ 
> file.pl for "file.pl" to be checked.  Zero hits on spam corpus.

I think this is definitely a section of SA that could
use some work, so ...  Patches welcome. :)    As a start,
PerMsgStatus::_get_parsed_uri_list() is the function that goes through
the text looking for hostnames or domains.  It looks for both schemed URIs
(http://.../) and schemeless URIs, which is where you're getting hit.

Everything else, such as URIDNSBL, keys off of that.


Random thought: URIDNSBL actually has a set of priorities when figuring out
which domains to query.  I wonder if the results would be better/worse if the
rules were based on the source type -- at least HTML versus parsed, but could
also be HTML tag, etc.

-- 
Randomly Selected Tagline:
"G: And are you using Windows or a Mac?
  T: Neither, I'm using Linux.
  G: Oh, you're a power user."            - Theo and his ex-ISP

Mime
View raw message