Return-Path: Delivered-To: apmail-spamassassin-users-archive@www.apache.org Received: (qmail 44952 invoked from network); 16 Nov 2009 06:00:44 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 16 Nov 2009 06:00:44 -0000 Received: (qmail 20319 invoked by uid 500); 16 Nov 2009 06:00:41 -0000 Delivered-To: apmail-spamassassin-users-archive@spamassassin.apache.org Received: (qmail 20266 invoked by uid 500); 16 Nov 2009 06:00:40 -0000 Mailing-List: contact users-help@spamassassin.apache.org; run by ezmlm Precedence: bulk list-help: list-unsubscribe: List-Post: List-Id: Delivered-To: mailing list users@spamassassin.apache.org Received: (qmail 20258 invoked by uid 99); 16 Nov 2009 06:00:39 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 16 Nov 2009 06:00:39 +0000 X-ASF-Spam-Status: No, hits=1.6 required=10.0 tests=MISSING_HEADERS X-Spam-Check-By: apache.org Received-SPF: unknown (nike.apache.org: error in processing during lookup of richard@buzzhost.co.uk) Received: from [82.70.24.238] (HELO stinger.wibblywobblyteapot.co.uk) (82.70.24.238) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 16 Nov 2009 06:00:28 +0000 Subject: Re: DNSBL Comparison 20091114 From: "richard@buzzhost.co.uk" Reply-To: richard@buzzhost.co.uk Cc: SpamAssassin Users List In-Reply-To: <6c399e450911151234p7540c3damfcf0619128e05411@mail.gmail.com> References: <4AFFB87C.4040407@redhat.com> <1258275197.7756.13.camel@testicle> <6c399e450911151234p7540c3damfcf0619128e05411@mail.gmail.com> Content-Type: text/plain Organization: richard@buzzhost.co.uk Date: Mon, 16 Nov 2009 06:00:07 +0000 Message-Id: <1258351207.7707.15.camel@testicle> Mime-Version: 1.0 X-Mailer: Evolution 2.24.3 Content-Transfer-Encoding: 7bit X-Spam-Level: * X-Spam-Checker-Version: SpamAssassin 3.2.5 (2008-06-10) on stinger X-Virus-Checked: Checked by ClamAV on apache.org X-Old-Spam-Flag: NO X-Old-Spam-Status: No, score=1.4 required=5.0 tests=ALL_TRUSTED,MISSING_HEADERS, MISSING_SUBJECT autolearn=disabled version=3.2.5 On Sun, 2009-11-15 at 20:34 +0000, Justin Mason wrote: > On Sun, Nov 15, 2009 at 08:53, richard@buzzhost.co.uk > wrote: > > On Sun, 2009-11-15 at 03:14 -0500, Warren Togami wrote: > >> http://mail-archives.apache.org/mod_mbox/spamassassin-users/200910.mbox/%3C4AD11C44.9030201@redhat.com%3E > >> Compare this report to a similar report last month. > >> > >> http://wiki.apache.org/spamassassin/NightlyMassCheck > >> The results below are only as good as the data submitted by nightly > >> masscheck volunteers. Please join us in nightly masschecks to increase > >> the sample size of the corpora so we can have greater confidence in > >> the nightly statistics. > >> > >> http://ruleqa.spamassassin.org/20091114-r836144-n > >> Spam 131399 messages from 18 users > >> Ham 189948 messages from 18 users > >> > >> ============================ > >> DNSBL lastexternal by Safety > >> ============================ > >> SPAM% HAM% RANK RULE > >> 12.8342% 0.0021% 0.94 RCVD_IN_PSBL * > >> 12.3053% 0.0026% 0.94 RCVD_IN_XBL > >> 31.2499% 0.0827% 0.87 RCVD_IN_ANBREP_BL *2 > >> 80.2578% 0.1485% 0.86 RCVD_IN_PBL > >> 27.1836% 0.1985% 0.79 RCVD_IN_SORBS_DUL > >> 19.8213% 0.1785% 0.79 RCVD_IN_SEMBLACK * > >> 90.9360% 0.3854% 0.77 RCVD_IN_BRBL_LASTEXT > >> 13.0564% 0.4838% 0.67 RCVD_IN_HOSTKARMA_BL * > >> > >> Commentary: > >> * PSBL and XBL lead in apparent safety. > >> * ANBREP was added after the October report and has made a surprisingly > >> strong showing in this past month. ANBREP is currently unavailable to > >> the general public. The list owner is thinking about going public with > >> the list, which I would encourage because they are clearly doing > >> something right. It seems he would need a global network of automated > >> mirrors to be able to scale. He would also need listing/delisting > >> policy clearly stated on a web page somewhere. > >> * SEMBLACK consistently has been performing adequately in safety while > >> catching a respectable amount of spam. I personally use this > >> non-default blacklist. > >> * It is clear that the two main blacklists are Spamhaus and BRBL. The > >> Zen combinatoin of Spamhaus zones is extremely effective and generally > >> safe. BRBL has a high hit rate as well, with a moderate safety rating. > >> * HOSTKARMA_BL ranks dead last in safety for the past several weeks in a > >> row, while not being more effective against spam than PSBL, XBL or SEMBLACK. > >> > >> =============================== > >> HOSTKARMA_BL much better as URIBL > >> =============================== > >> SPAM% HAM% RANK RULE > >> 68.3651% 0.2806% 0.79 URIBL_HOSTKARMA_BL * > >> > >> Commentary: > >> While HOSTKARMA_BL is pretty unsafe as a plain DNSBL, it is surprisingly > >> effective as a URIBL. This is curious as it seems it was not designed > >> to be used as a URIBL. In any case as long our masschecks show good > >> statistics like this, I will personally use this on my own spamassassin > >> server. > >> > >> ========================= > >> SPAMCOP Dangerous? > >> ========================= > >> SPAM% HAM% RANK RULE > >> 17.4225% 2.6076% 0.56 RCVD_IN_BL_SPAMCOP_NET * > >> > >> Commentary: > >> Is Spamcop seriously this bad? It consistently has shown a high false > >> positive rates in these past weeks. Was it safer than this in the past > >> to warrant the current high score in spamassassin-3.2.5? > >> > >> Warren Togami > >> wtogami@redhat.com > > > > Is it not a bit flawed to do the metrics on volunteer submissions, given > > the Spamhaus has is said to have a small army of them? It means the data > > cannot be relied upon as any kind of sensible comparison. > > please explain. How would you suggest measuring false positives? > Do you think that volunteer submissions are an accurate way to do them, or do you think that is open to abuse? For example, say I am Steve Linford with a small army of volunteers. I get a few false positives come in from Spamhaus, and a few from SORBS. What is my inclination when I submit the data? It takes only a small amount of research and a trawl through the NANAE archives to get a handle on the problem, and the general abuse and nefarious goings on with DNSBL volunteers. It is fair to say that there is not much love lost. I'm not pretending I have the answers, so it's probably better to take these lists with a large bucket of salt and find how any given DNSBL list works for a given organisation. In a world where presidents and world leaders in America, Zimbabwe and Afghanistan get 'elected' on tainted data, some random RBL 'comparison' list is a trivial by comparison. It must, however, be duly remembered that there are many competing 'sides' in the world of the DNSBL's, each looking to do the other discredit. Perhaps Jim, as you posed the question - you have some strong feelings on the matter that you would like to share?