spamassassin-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jeff Chan <>
Subject Re: Announcing SpamCopURI 0.08 support of SURBL for spam URI domain tests
Date Sat, 03 Apr 2004 09:09:00 GMT
On Friday, April 2, 2004, 9:02:59 PM, Loren Wilton wrote:
> Jeff, I had a look at your list at some random time a few days ago.  I
> noticed that the top 90% or so of the reports looked pretty solid.  At the
> instant I looked, the bottom 10% of the reports were most all highly
> suspect.  This is where the yahoo and geocities and other whitelist stuff
> was showing up.  Some other reports (and I can't remember what any of them
> were) also seemed somewhat suspect, even though they probably weren't on a
> whitelist.

> I concluded that only the top 90% of your reports should be used in the
> blocking test, and ignore the reports with less than 10% of the
> highest-scoring report.  Now, perhaps this percentage fluxuates with time, I
> certainly haven't made multiple checks to see.  And maybe after whitelist
> removal the rest of the bottom 10% really is spam.

> But I think it would be an interesting experiment to compare the relibility
> of the top 90% to the relibility of the entire collection.

Thanks for checking this over for us!  It looks like you visited:

which does not have the whitelist entries removed from it and
which does not go all the way down to the threshold of 10 spams.

The full list which is about 11000 entries can be seen at:

This is a basis for the thresholded 400 or so domains at:

which doesn't show the counts used to threshold, but they
all got over 10 counts.  It does however have some duplicates
like for eliminated and perhaps most
importantly *has had the whitelisted domains and two level ccTLDs
removed*.  It is the basis for the RBL:

Due to the whitelisting and thresholding, the domains that make it
into SURBL are quite spammy, hopefully and probably more than the
90% you estimated on the unfiltered list.


Jeff C.
Jeff Chan

View raw message