spamassassin-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mark Martinec <Mark.Martinec...@ijs.si>
Subject Re: Recommendations for ASF SA Implementation
Date Fri, 20 Mar 2015 12:41:56 GMT
2015-03-17 22:16, Kevin A. McGrail wrote:

> I am working on recommendations for the ASF to modernize the
> installation of SA for the foundation.
> 
> We have some givens:
> 
> Using Ubuntu
> Using Postfix
> Need to stick with maintainable packages
> Likely needs to stay away from lots of tweaks and heavy customization
> such as using MIMEDefang (unfortunate).
> 
> So I'd like any input you might have, on or off list.


Axb wrote:
| Although I'd suggest Fuglu, the obvious choice should probably be 
amavisd-new
| considering Mark is also highly involved in SA dev work.
| It's also distributed by Ubuntu so it would be one package less to 
maintain
| outside the distro. We'd get the best of both worlds.
| Axb

Thanks, Amavis would be my choice too :)))


back to Kevin:
> Here's some questions I believe will help guide things:
> 
> Q1 - What is the best glue for SA for Postfix that does the following:
> 
> - uses spamc calls so that spamd's can be distributed and load 
> balanced?

Amavis uses a standard protocol SMTP for communication with an MTA
instead of the proprietary spamc/spamd protocol. Other than that,
interfacing to the SpamAssassin is pretty much the same as in spamd,
i.e. uses pre-forked set of processes which use the SpamAssassin 
library.
For this reason the performance is pretty much the same - the bottleneck
is processing rules in the SpamAssassin.

> can be distributed and load balanced?

Yes, can be distributed and load balanced. Two approaches are most
apparent:
- the classical approach is to run multiple postfix+amavis
combos on several hosts, and let MX dns record distribute the load
across them. If a single IP address is desired, an SMTP proxy (such
as nginx) can do the task of load sharing in front of Postfix.
- if a single MTA is preferred with multiple content filters on
multiple hosts, then traffic from Postfix to amavisd instances
can be spread using HAProxy (or some other load balancer).

Note that it is beneficial to feed outgoing mail through amavis too
for the following reasons:
- the PenPals feature keeps track of ongoing conversations and
contributes negative score points to such, preventing some false
positives on marginal mail content (a requirement is a common
database for all amavis instances, preferably redis, possibly SQL);
- when SpamAssassin autolearning is enabled, outgoing mail
contributes its valuable share of ham samples;
- when an internal machine or a user mail account gets compromised
and starts spewing malware or spam, it will get blocked and detected.
- not to forget: to DKIM-sign outbound mail it needs to pass
through a signer. Amavisd can do DKIM signing (and verification).


> - can implement clamav before SA call

Yes.

Also, considering that some of the third-party ClamAV rulesets
are prone to false positives, or intentionally target spam (not
viruses and other malware), amavis can be configured to reclassify
certain malware (by name) as spam, contributing to SpamAssassin score
and not blocking as malware right away.


> - should silently discard emails if a virus is detected

Configurable, but you don't want to do that, and (as Reindl Harald
noted) may even be violating law. Unwanted mail must be rejected
at an SMTP level (or delivered to a dedicated folder or quarantine),
it must not be lost. Amavis is nowadays typically deployed as a
before-queue Postfix content filter so that it can reject mail
while the original session is still open.

Keep in mind that antivirus software does occasionally produce
false positives, ClamAV with third party rules even more so.
A legitimated sender must be notified is this happens.


> - must use clamdscan but ideally can utilize some sort of socket
> solution for clamd to run distributed and load balanced

Can do.  Amavisd cam interface with clamd either through
clamdscan, or (preferably) by directly talking to it over
the clamd protocol (thus eliminating clamdscan from the setup).
As this is a normal TCP connection, it can be load balanced
using HAProxy, although it probably makes more sense to keep
amavis+clamd pairs on each host.


> - should bound email over a certain threshold (let's say 5) and
> silently discard email over a certain threshold for SA (let's say 10)

Possible. There are a couple of configurable spam score levels,
each with its configurable action:

   tag level  - adds X-Spam-* headers (ham or spam)
   tag2 level - adds X-Spam-* headers, claims it is spam
   tag3 level - adds X-Spam-* headers, claims it is blatant spam
   kill level - (typically) rejects mail (or can discard or deliver)

Quarantining at each spam level is configurable independently.


> - Might use a few RBLs to decline connections to start

Yes. That belongs to Postfix.


> - Implements a good implementation of greylisting

That belongs to Postfix.
I tend to shy away from greylisting, it is much less effective
as it used to be initially. In my opinion it does more harm than good.


> - Temporary failure for scanning (virus or spam) failures

Yes. Any fatal/unrecoverable failure causes a SMTP temporary failure
(4xx response either from amavis or from an MTA). No mail can get lost.


> Q2 - Do we happen to know who maintains SA for Ubuntu so we can try
> and work to make sure the upcoming release of 3.4.1 is packaged?

No idea. I thought the ASF infrastructure runs on FreeBSD mostly.


> Here's the high level draft if anyone has some thoughts:
> 
> - Implement a cluster of spamd servers with no Bayes but likely using
> SQL prefs for some whitelist/blacklisting - Bayes not being used
> because training and maintaining will likely be too difficult

I find bayes with autolearning very valuable (using redis backend,
mostly maintenance-free). Probably not so good at some general public
mail provider, but certainly good for a scope of users sharing mostly
technically oriented / common interests mail.

> - Implement txrep with SQL backend

Haven't tried txrep yet.

> - Implement a cluster of clamav boxes

ClamAV is usually faster that SpamAssassin. I'd keep several instances
of amavis+SpamAssassin+clamd (with or without a Postfix instance)
on multiple hosts if the load is really that high.

> - Implement an SPF record

Yes, an unfortunate fact of life.

Not to forget, DKIM signing is essential, must be done *after*
mailing list fanout.

> - Implement postfix with xyz glue to test email on a scalable # of mx's

Sure.

> - Implement a few RBLs to block SMTP connections - I hate to recommend
> this but ASF members are very sensitive to spam so I'm treading
> lightly

Some high-quality RBLs at an MTA level are desired.
Postfix even implements weighting with a threshold
over multiple RBLs if desired.


For a high-level view on Amavis see the Wikipedia article:

   http://en.wikipedia.org/wiki/Amavis


Perhaps I should point out some more features that I find valuable:
- amavis can block mail based on declared MIME content type or MIME 
name,
   or based on a MIME part's content as classified by a file(1) utility.
   This helps with first waves of malware before virus scanners get their
   signatures updated, e.g. block MS executables;
- produces detailed logging in JSON (in addition to syslog). JSON 
logging
   can be valuable for effectively feeding into 
Elasticsearch/Logstash/Kibana
   or into Splunk or other log analyzers;
- large mail (over the SpamAssassin's limit) is not just blindly passed,
   but a truncated section of mail is passed to SpamAssassin for 
evaluation,
   with DKIM signature checks already done on the full pristine mail 
content,
   so that truncation does not invalidate signatures, yet in many cases
   SpamAssassin can still do its job reasonably well.


Mark

Mime
View raw message