Mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mark Martinec <Mark.Martinec...@ijs.si>
Subject Re: xxxl spam
Date Wed, 12 Apr 2006 10:39:22 GMT
Justin,

> Mark Martinec writes:
> > As a curiosity (but off topic), harvesting results from p0f
> > (passive operating system fingerprinting), here are two more:
> >   http://www.ijs.si/software/amavisd/fig1.gif
> >     Spam score vs. IP distance in hops (our server is
> >     in European academic network Geant)
> >   And perhaps most interesting of all (by again OT):
> >   http://www.ijs.si/software/amavisd/fig2.gif
> >     Spam score distribution as a percentage of all mail,
> >     separate by each sending mail client's operating system.

> That's excellent data!  Mind if I forward that around to another
> list or two?

I don't mind.

> The "hops" measurement is particularly interesting.  Have you got that
> implemented as a working rule, in the field?  is it expensive?

Yes, implemented in the field - comes with the latest amavisd-new-2.4.0.
It inserts one header field with collected information into mail header,
making it available to SA to score it as it wishes (custom rules, bayes).
It could probably just as well be implemented as a SA plugin (making use
of the supplied lightweight p0f-analyzer.pl interface to p0f), but it was
easier for me to do it in amavisd-new, where remote SMTP client's IP address
is accessible directly, not needing to parse header and understand topology.

It is reasonably inexpensive: cost of running p0f utility is comparable to
running tcpdump, it takes about one hour CPU per month on our medium-busy
mailer, the rest is negligible, no additional latencies and no additional
network traffic.

The most interesting part in my view is not the IP distance, but the
type of OS, illustrated by the following table (derived from the same
data as fig2):

    p0f OS guess    ham :   spam
    -----------------------------
    Windows-XP    0.7 % : 99.3 %
    Windows-2000  5.8 % : 94.2 %
    UNKNOWN      16.5 % : 83.5 %
    Linux        58.8 % : 41.2 %
    Unix         80.3 % : 19.7 %
    (Unix+Linux  66.5 % : 33.5 %)

Only 0.7% of all mail coming from Windows-XP hosts is ham!!!
It is an ideal information to contribute two or three score points.

Traffic from own PC clients must not be seen by p0f, otherwise one would
be penalizing site's own user. This can be achieved by either separating
MSA from MTA, or using list of internal IP networks for exclusion.


A quick summary from amavisd-new-2.4.0 release notes:

- experimental support for passive operating system fingerprinting with
  the use of externally running utility p0f, supplying collected information
  as a header field to SpamAssassin, making possible to add rules to score
  SMTP client hosts based on educated guess about their operating system
  type and IP distance; see below for details;

Here are the installation details:

- passive operating-system fingerprinting (p0f) support lets SA gain
  information about SMTP client's operating system and estimated IP distance,
  and can reduce the number of bounces:

  * find and install the p0f utility: http://lcamtuf.coredump.cx/p0f.shtml
    or in FreeBSD ports collection as 'net-mgmt/p0f';

  * start a p0f process on the same host where MTA (MX) is running, making
    it listen only to incoming TCP sessions (to reduce its workload) to the
    IP address and TCP port (25) where MTA is accepting incoming mail from
    outside (it doesn't hurt to let it see other traffic too, it just isn't
    needed); after testing p0f alone and seeing that it works, you may start
    it up, feeding its output to program p0f-analyzer.pl that comes with
    amavisd-new package, e.g.:

      p0f -l 'tcp dst port 25' 2>&1 | p0f-analyzer.pl 2345 &

    on multi-homed boxes one may need to specify interface and IP address
    where MTA is listening, the filter syntax is the same as in tcpdump, e.g.:

      p0f -l -i bge0 'dst host 192.0.2.66 and tcp dst port 25' 2>&1 \
        | p0f-analyzer.pl 2345 &

  * the program p0f-analyzer.pl reads p0f reports on stdin, keeps a cache
    for a limited time (10 minutes, configurable) of data about incoming TCP
    sessions organized by remote IP address, and listens on UDP port 2345
    (specified as its command line argument) for queries; only queries from
    allowed IP addresses are accepted and responded to, other queries are
    silently ignored - configure @inet_acl accordingly, defaults to 127.0.0.1;

  * adding the following line to amavisd.conf, matching the chosen port
    number to the one specified on the command line to the p0f-analyzer.pl:

      $os_fingerprint_method = 'p0f:127.0.0.1:2345';

    makes amavisd send queries to p0f-analyzer.pl (on the supplied IP address
    and UDP port number) to collect information about remote SMTP client's OS;
    collected response is then supplied as a header field when SpamAssassin
    is invoked;  query/response is very quick and imposes no burden on amavisd
    process nor does its extend its processing time. The $os_fingerprint_method
    setting is also a member of policy banks to make it more flexible to
    disable fingerprinting for mail from site's own SMTP clients, e.g:

      $policy_bank{'MYNETS'}{os_fingerprint_method} = undef;

  * one may now add scoring rules to SA local.cf file, e.g.:

    header L_P0F_WXP   X-Amavis-OS-Fingerprint =~ /^Windows XP/
    score  L_P0F_WXP   3.5
    header L_P0F_W     X-Amavis-OS-Fingerprint =~ /^Windows(?! XP)/
    score  L_P0F_W     1.7
    header L_P0F_UNKN  X-Amavis-OS-Fingerprint =~ /^UNKNOWN/
    score  L_P0F_UNKN  0.8
    header L_P0F_Unix  X-Amavis-OS-Fingerprint =~ /^((Free|Open|Net)BSD)|Solaris|HP-UX|Tru64/
    score  L_P0F_Unix  -1.0

    It is also possible to add score based on estimated IP distance, for
    example to slightly favorize nearer hosts (this is probably good for Europe
    or academic/university networks, and possibly less useful elsewhere):

    header L_P0F_D1234 X-Amavis-OS-Fingerprint =~ /\bdistance [1-4](?![0-9])/
    header L_P0F_D5    X-Amavis-OS-Fingerprint =~ /\bdistance 5(?![0-9])/
    header L_P0F_D6    X-Amavis-OS-Fingerprint =~ /\bdistance 6(?![0-9])/
    header L_P0F_D7    X-Amavis-OS-Fingerprint =~ /\bdistance 7(?![0-9])/
    header L_P0F_D8    X-Amavis-OS-Fingerprint =~ /\bdistance 8(?![0-9])/
    header L_P0F_D9    X-Amavis-OS-Fingerprint =~ /\bdistance 9(?![0-9])/
    header L_P0F_D10   X-Amavis-OS-Fingerprint =~ /\bdistance 10(?![0-9])/
    header L_P0F_D11   X-Amavis-OS-Fingerprint =~ /\bdistance 11(?![0-9])/
    score  L_P0F_D1234 -0.5
    score  L_P0F_D5    -0.5
    score  L_P0F_D6    -0.5
    score  L_P0F_D7    -0.5
    score  L_P0F_D8    -0.5
    score  L_P0F_D9    -0.4
    score  L_P0F_D10   -0.3
    score  L_P0F_D11   -0.3

  * make sure the @mynetworks is configured correctly, otherwise you will be
    inappropriately penalizing mail from internal hosts running Windows!
    Other methods to turn off fingerprinting for our own SMTP client hosts
    is to put $os_fingerprint_method in policy banks, and/or to specify
    more selective packet filter on the p0f command line;

  * based on statistics, less than 0.7 % of mail coming from external
    Windows XP -based hosts is ham, yet 20 % of all spam is coming from
    external Windows XP hosts; amavisd-new suppresses bounces to external
    Windows XP hosts, reducing bounce pollution. The amavisd-agent utility
    now provides some additional statistics based on p0f information.

    Some statistics collected from our logs in February 2006:
    p0f OS guess    ham :   spam
    -----------------------------
    Windows-XP    0.7 % : 99.3 %
    Windows-2000  5.8 % : 94.2 %
    UNKNOWN      16.5 % : 83.5 %
    Linux        58.8 % : 41.2 %
    Unix         80.3 % : 19.7 %
    (Unix+Linux  66.5 % : 33.5 %)
      (ham: mail with score below 3,  spam: score above 6)


Mark

Mime
View raw message