spamassassin-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Daniel Quinlan <quin...@pathname.com>
Subject Re: Announcing SpamCopURI 0.08 support of SURBL for spam URI domain tests
Date Sat, 03 Apr 2004 20:52:23 GMT
Jeff Chan <jeffc@surbl.org> writes:

> I agree with the content check, but will step on many toes here
> by proclaiming that other blacklists (other than SBL), name
> servers, registrars, ISP address blocks, and similar approaches
> are overly broad and have too much potential for collateral
> damage *for my sensibilities*.

There are other blacklists just as accurate as SBL (and some more
accurate).  And bear in mind these are secondary checks to lower the
threshold for a URI already reported to SpamCop so the accuracy should
be really good (two 99% accurate features => more than 99% accurate
together).

OVERALL%   SPAM%     HAM%     S/O    RANK   SCORE  NAME
  69948    37790    32158    0.540   0.00    0.00  (all messages)
100.000  54.0258  45.9742    0.540   0.00    0.00  (all messages as %)
  1.016   1.8815   0.0000    1.000   0.93    8.60  RCVD_IN_OPM_SOCKS
  2.918   5.3956   0.0062    0.999   0.94    0.62  RCVD_IN_NJABL_DIALUP
  1.138   2.1037   0.0031    0.999   0.93    8.60  RCVD_IN_OPM_HTTP
  1.107   2.0455   0.0031    0.998   0.93    8.60  RCVD_IN_OPM_HTTP_POST
  7.769  14.3292   0.0591    0.996   0.94    1.27  RCVD_IN_SBL
  2.698   4.9749   0.0218    0.996   0.93    0.53  RCVD_IN_RSL
 19.630  36.1842   0.1772    0.995   0.97    2.55  RCVD_IN_SORBS_DUL
  3.127   5.7581   0.0342    0.994   0.92    0.74  RCVD_IN_NJABL_SPAM
  9.759  17.9360   0.1493    0.992   0.93    1.20  RCVD_IN_SORBS_MISC
  5.067   9.3146   0.0746    0.992   0.92    0.01  T_RCVD_IN_AHBL_SPAM
  0.815   1.4978   0.0124    0.992   0.91    1.20  RCVD_IN_SORBS_SMTP
 32.202  59.1532   0.5317    0.991   0.99    1.10  RCVD_IN_DSBL
 17.386  31.8735   0.3607    0.989   0.95    1.00  RCVD_IN_XBL
 13.524  24.8002   0.2736    0.989   0.94    1.20  RCVD_IN_NJABL_PROXY
  9.088  16.6711   0.1772    0.989   0.93    1.20  RCVD_IN_SORBS_HTTP

(some older mail being tested, so these numbers are going to be somewhat
off)

> I really, really hate blacklisting innocent victims.  I consider that
> a false accusation or even false punishment.  Having policies which
> allow blacklisting an entire ISP or even an entire web server IP
> address have the potential to harm too many innocent bystanders, IMO.
> Your mileage may and probably does vary.  ;)

You already have a repeated URL.  Are you just railing about other
blacklists or did you really consider my suggestion?  SpamCop is no more
accurate than the above blacklists.  People report ham all the time,
sometimes repeatedly.
 
> Our approach is to start with some likely good data in the
> SpamCop URIs.  See comments below.

And these are ways to make the data more accurate.
 
> I agree in principle, however I feel that the SpamCop reported
> URIs tend to have relatively few FPs.  They are domains that
> people took the time to report; in essence they are *voting with
> their time that these are spam domains*.

Again, SpamCop has false positives.  It is no magic bullet.  Some
mailing lists are very low volume so when an announcement or conference
notice goes out, people report it as spam even though they actually
subscribed.  It happens all the time.

I think pre-seeding a whitelist would be a sensible precaution against
joe jobs and the more sporadic (for any one domain, SpamCop has false
positives probably every day) type of false positive.
 
> I hope I'm not taking too confrontational a tone here.  I'm just
> trying to defend our approach, which I think can be valid.

Nobody is attacking your approach.  I only made these suggestions to
potentially allow you to selectively lower or raise your threshold for
specific URLs based on other data and therefore increase your accuracy
and spam hit rate.  I suspect your blacklist will work well once a
plug-in supports it, but until then it seems like further discussion is
a waste of my time.

Daniel

-- 
Daniel Quinlan                     anti-spam (SpamAssassin), Linux,
http://www.pathname.com/~quinlan/    and open source consulting

Mime
View raw message