Return-Path: Delivered-To: apmail-spamassassin-dev-archive@www.apache.org Received: (qmail 18619 invoked from network); 21 Nov 2009 19:21:30 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 21 Nov 2009 19:21:30 -0000 Received: (qmail 51306 invoked by uid 500); 21 Nov 2009 19:21:30 -0000 Delivered-To: apmail-spamassassin-dev-archive@spamassassin.apache.org Received: (qmail 51236 invoked by uid 500); 21 Nov 2009 19:21:29 -0000 Mailing-List: contact dev-help@spamassassin.apache.org; run by ezmlm Precedence: bulk list-help: list-unsubscribe: List-Post: List-Id: Delivered-To: mailing list dev@spamassassin.apache.org Received: (qmail 51227 invoked by uid 99); 21 Nov 2009 19:21:29 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 21 Nov 2009 19:21:29 +0000 X-ASF-Spam-Status: No, hits=0.5 required=5.0 tests=AWL,BAYES_20,FM_FAKE_HELO_VERIZON X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of mkettler_sa@verizon.net designates 206.46.173.19 as permitted sender) Received: from [206.46.173.19] (HELO vms173019pub.verizon.net) (206.46.173.19) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 21 Nov 2009 19:21:26 +0000 Received: from [192.168.1.7] ([74.107.123.32]) by vms173019.mailsrvcs.net (Sun Java(tm) System Messaging Server 6.3-7.04 (built Sep 26 2008; 32bit)) with ESMTPA id <0KTH007LR4F0TS6M@vms173019.mailsrvcs.net> for dev@spamassassin.apache.org; Sat, 21 Nov 2009 13:21:01 -0600 (CST) Message-id: <4B083DAE.2020904@verizon.net> Date: Sat, 21 Nov 2009 14:21:18 -0500 From: Matt Kettler User-Agent: Thunderbird 2.0.0.23 (Windows/20090812) MIME-version: 1.0 To: Joao Gouveia Cc: dev@spamassassin.apache.org Subject: Re: Strange ham corpus? References: <1258828897.21880.73.camel@localhost> In-reply-to: <1258828897.21880.73.camel@localhost> Content-type: text/plain; charset=UTF-8 Content-transfer-encoding: 7bit Joao Gouveia wrote: > (resending this, used a wrong email account ..) > > Hi, > > I was checking for FPs in our RBL, and noticed that most of them are > hitting on a ham corpus that doesn't look very hammy to me: > > http://ruleqa.spamassassin.org/20091121-r882858-n/T_RCVD_IN_ANBREP_L3?mclog=ham-net-nbebout > > The scores are a bit strange (so are the rules being hit). Is this > really supposed to be ham? > > I have to admit, this does look like a spam corpus. Of 77 messages 62 hit RAZOR2_CF_RANGE_51_100. 49 hit URIBL_BLACK 45 hit T_URIBL_META_SURBL_ANY 26 hit RCVD_IN_XBL 25 hit various JM_SOUGHT rules. Given the broad diversity of fairly reliable spam indicators all matching heavily on this mail, this is either a spam corpus, or a corpus of email from "shady" companies that do lots of spamming, but the corpus maintainer actually subscribed to them.