spamassassin-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Warren Togami <wtog...@redhat.com>
Subject Harvested Fresh .cn URIBL
Date Wed, 07 Oct 2009 15:00:19 GMT
http://ruleqa.spamassassin.org/20091006-r822170-n/T_CN_URL/detail
A very sizeable amount of spam (currently 50%) contains .cn domains that 
were registered very recently.  They keep registering new domains in 
order to keep ahead of the URIBL's.

http://ruleqa.spamassassin.org/20091006-r822170-n/T_CN_EIGHT/detail
Last month, I noticed that a very sizeable percentage of the .cn spam 
were fresh and random \w{8}.cn domain names.

http://ruleqa.spamassassin.org/20091007-r822624-n/T_CN_SEVEN/detail
I don't know if it was due to our discussion here, but for whatever 
reason I began seeing new spam with \w{7}.cn domains registered since 
October 3rd, and \w{8}.cn seems to be tapering off now.

http://spameatingmonkey.com/lists.html#SEM-FRESH
\w{8}.cn or any length is unsafe to be used as a real rule.  The only 
safe way to detect these fresh .cn domains would be a URIBL.  But 
URIBL's like SEM-FRESH described here are only capable of knowing new 
domains of TLD's who provide zone files that can be compared.

It seems then the only way to feed a URIBL fresh .cn domains would be a 
spam trap.  This proposed URIBL would be extremely easy to build on the 
infrastructure of existing trap-based DNSBL's like PSBL, HOSTKARMA or 
SEM.  My own volume of spam is too small to do this.

A targeted URIBL verified by whois for registration dates would be near 
100% accurate and deserving of a high score.  This would hopefully break 
the economic feasibility of .cn URI spam by rendering fresh domains 
quickly useless.  This could be a new URIBL, or an existing URIBL.  If 
this is an existing URIBL, spamassassin can use meta rules to boolean 
match .cn domains and assign a higher score.  Example:

meta FRESHCN_7 SOME_URIBL && CN_URL
score FRESHCN_7 0 4.0 0 4.0

Spam Trap Workflow
==================
1. Spam trap receives spam containing .cn URI.
2. Lookup locally, is this .cn domain already known?
3. If already known, stop.
4. Lookup A record of this domain.  If NXDOMAIN stop.
5. Record domain in database with UNKNOWN registration date.

URIBL Generation Workflow
=========================
1. If domain has UNKNOWN registration date, attempt whois lookup.
    Record registration date if found.
2. Ignore all UNKNOWN records.
3. Dump all domains registered in the last 7 days into one zone.
    score FRESHCN_7 0 4.0 0 4.0
4. Dump all domains registered in the last 14 days into another zone.
    score FRESHCN_14 0 2.0 0 2.0
5. Stop listing anything older than 14 days.  By then the regular 
URIBL's have listed these domains.
6. Do not delete older .cn domains.  Keeping them in the database 
prevents redundant whois lookups later.

The only challenging part here is whois lookup rate limiting.  whois 
lookups are critical to populating this URIBL, but it is a resource that 
can only be used in small quantities.  The above workflow attempts to 
minimize the number of whois lookups.

Given that only spammers would send mail to a trap, the number of .cn 
domain names might be small enough to handle whois lookups.  The goal 
here is to break the economic model.  I'm told that .cn domains cost 
$3-10/each to register, and whois lookups are certainly cheaper to 
automate.  I can't find a published whois rate limit for CNNIC.  In any 
case, it wouldn't be difficult for us to proxy whois lookups to bypass 
rate limits should that become necessary.

Opinions of this proposal?

Is anyone from PSBL, HOSTKARMA, or SEM interested?

Warren Togami
wtogami@redhat.com

Mime
View raw message