Return-Path: Delivered-To: apmail-spamassassin-dev-archive@www.apache.org Received: (qmail 77248 invoked from network); 4 Sep 2004 16:45:51 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (209.237.227.199) by minotaur-2.apache.org with SMTP; 4 Sep 2004 16:45:51 -0000 Received: (qmail 50799 invoked by uid 500); 4 Sep 2004 16:45:49 -0000 Delivered-To: apmail-spamassassin-dev-archive@spamassassin.apache.org Received: (qmail 50772 invoked by uid 500); 4 Sep 2004 16:45:49 -0000 Mailing-List: contact dev-help@spamassassin.apache.org; run by ezmlm Precedence: bulk list-help: list-unsubscribe: list-post: List-Id: "SpamAssassin Development" dev@spamassassin.apache.org Delivered-To: mailing list dev@spamassassin.apache.org Received: (qmail 50759 invoked by uid 500); 4 Sep 2004 16:45:49 -0000 Delivered-To: apmail-incubator-spamassassin-dev@incubator.apache.org Received: (qmail 50755 invoked by uid 99); 4 Sep 2004 16:45:49 -0000 X-ASF-Spam-Status: No, hits=0.7 required=10.0 tests=SUBJ_HAS_SPACES X-Spam-Check-By: apache.org Received-SPF: neutral (hermes.apache.org: local policy) Received: from [204.83.220.141] (HELO drizzle.sasknow.net) (204.83.220.141) by apache.org (qpsmtpd/0.28) with ESMTP; Sat, 04 Sep 2004 09:45:47 -0700 Received: from mail.sasknow.com (mail.sasknow.com [207.195.92.135]) by drizzle.sasknow.net (8.12.9p2/8.12.9) with ESMTP id i84Gjiv6055798; Sat, 4 Sep 2004 10:45:44 -0600 (CST) (envelope-from ryan@sasknow.com) Date: Sat, 4 Sep 2004 10:45:44 -0600 (CST) From: Ryan Thompson To: Jeff Chan , SURBL Discussion list cc: SpamAssassin Developers Subject: Re: [SURBL-Discuss] checking plain domains in message bodies against SURBLs reportedly effective In-Reply-To: <1411296005.20040904042613@supranet.net> Message-ID: <20040904102324.O53861@drizzle.sasknow.net> References: <1411296005.20040904042613@supranet.net> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed X-Spam-Virus-Status: Clean, ClamAV version devel-20040729, clamav-milter version 0.75b on drizzle.sasknow.net X-Spam-Status: No, hits=-13.266 required=7 tests=MSGID_PINE=-2.1 RT_SUBJ_RE7=-0.3,RT_SUBJ_81_PLUS=0.4,SUBJ_HAS_SPACES=1.7 ALL_TRUSTED=-0.8,BAYES_00=-4.9,BAYES_LOW_AND_TZ_NEAR=-7.0 AWL=-0.3 autolearn=ham version=3.000000-pre3 X-Virus-Checked: Checked X-Spam-Rating: minotaur-2.apache.org 1.6.2 0/1000/N Jeff Chan wrote to SpamAssassin Developers: > Randy Brukardt of rrsoftware.com mentioned that checking > plain domains occurring in message bodies against SURBLs > was pretty productive. (E.g., look for domain.com in > addition to www.domain.com or http://www.domain.com). > > Perhaps this could be something interesting to at least try > experimentally or to think about. Yep. Good idea, overall. There are a few gotchas: TLD extensions sometimes map file extensions. We might have to whitelist command.com, and the entire country of Poland. :-) Looking at the above sentence, leading/trailing punctuation might be a potential snag. I.e.: 4 cheap pillz, go to somethingsleazy.com, and give us your money. Since the domain is in plain text and doesn't contain a protocol or subdomain (i.e., 'www'), I haven't yet seen a mail client that will display it as a clickable URL. Thus, with this, we're probably mostly fighting the "type this in" or "cut and paste into your browser" type of spammer. SO, if we do this, implementers could force spammers to obfuscate the domains beyond recognition. They'll have to do their own munging, and we might try to catch it, but that's risky. "i looked on the boss' computer and found porn. info forthcoming...", or even, "spammer dot com operations are a plague on civilized nations". Any implementations will probably have to run against large ham corpora to see if anything like the above becomes falsely *extracted* as a URI, regardless of whether the current data happens to cause a FP. I'd advise keeping implementations simple and strict by default (i.e., no deobfuscation; maybe just clickable links only), and allow the user to control the amount of fuzziness they'd like to match on. - Ryan -- Ryan Thompson SaskNow Technologies - http://www.sasknow.com 901-1st Avenue North - Saskatoon, SK - S7K 1Y4 Tel: 306-664-3600 Fax: 306-244-7037 Saskatoon Toll-Free: 877-727-5669 (877-SASKNOW) North America