spamassassin-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jeff Chan <>
Subject Re: Spam URI TLD report sizes
Date Tue, 30 Mar 2004 23:50:16 GMT
On Tuesday, March 30, 2004, 3:18:02 PM, Daniel Quinlan wrote:
> Jeff Chan <> writes:
>> FWIW Here's an du -sk directory size summary of the reports
>> SURBL grabbed from SpamCop Spamvertised sites over the past
>> 4 days or so, stored by TLD or first octet of a numeric URI:
>> KBytes  TLD or first octet of numeric address

> It might be interesting to do checks on /24 networks since spammers will
> often get a whole block of addresses and divvy up their current domains
> amongst them.

> If it's possible and not too much work for you, it might be worth trying
> a bunch of different approaches on different temporary subdomains and
> then we can compare each against our corpora.

> - longer timeout vs. shorter timeout
> - lower threshold vs. higher threshold
> - gathering /24 networks for numeric addresses (combined with an A
>   lookup of non-numeric addresses on your end).

I misspoke somewhat that the data used for this is the source for
SURBL (and not vice versa :-).  None of the thresholding that SURBL
does is reflected in my previous posting.  That was based on the
raw data of reports including any that don't happen to come up to
the SURBL threshold.

/24s are visible in the data as:  where N is a number

Similarly /16, and /32

with the exception that we skipped /8s because they would be
too inclusive, just like we skipped accumulated reports of TLDs.

So for example, under 211 there is data for:

Some of the other 211/16s are much larger, such as:

No name resolution is done on the domain name data.
No reverse name resolution is done on the numeric IP Address
data from URIs.  It's all just a record of what was reported.
We could do resolution, and may in future, but that's not our
primary focus for the use of the data.

Jeff C.
Jeff Chan

View raw message