Mailing-List: contact users-help@spamassassin.apache.org; run by ezmlm
Precedence: bulk
Subject: Re: Tuning recommendations?
To: users@spamassassin.apache.org
References: <e6a63009-84db-9add-882f-f19075a1e5ae@camerontech.com>
 <alpine.LNX.2.00.1609121052260.11663@athena.impsec.org>
 <715ec153-6cf3-9034-eafe-606175b8f03d@camerontech.com>
 <alpine.LNX.2.00.1609121213000.11663@athena.impsec.org>
From: thomas cameron <thomas.cameron@camerontech.com>
Message-ID: <312bec8d-6d46-f755-4edc-ee92028c1eda@camerontech.com>
Date: Mon, 12 Sep 2016 19:02:39 -0500
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101
 Thunderbird/45.2.0
MIME-Version: 1.0
In-Reply-To: <alpine.LNX.2.00.1609121213000.11663@athena.impsec.org>
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 7bit
archived-at: Tue, 13 Sep 2016 00:02:52 -0000

On 09/12/2016 02:32 PM, John Hardin wrote:
> On Mon, 12 Sep 2016, thomas cameron wrote:
> 
>> On 09/12/2016 01:06 PM, John Hardin wrote:
>>> On Mon, 12 Sep 2016, thomas cameron wrote:
>>>
>>>
>>> Make sure you have a local recursing (**NOT** forwarding) DNS server
>>> that your MTA and SA are configured to use. Reason: if you're forwarding
>>> your MTA DNS requests to your ISP's DNS server, the aggregated traffic
>>> of you plus all the other ISP clients can exceed the various DNSBL and
>>> URIBL free-usage limits, rendering those tools useless.
>>
>> [root@mail-west ~]# grep recurs /etc/named.conf
>>     allow-recursion { 127.0.0.1; };
>>
>>> A clear indicator this is happening: URIBL_BLOCKED hits.
>>
>> I see "URIBL_BLACK Contains an URL listed in the URIBL blacklist" in the
>> headers of many of the messages that got through. Is that what you mean?
> 
> No. URIBL_BLACK indicates your URIBL queries are succeeding, that's a
> hit. URIBL_BLOCKED means "request blocked", probably due to exceeding
> the limits.

OK, thanks.

>>> Train up your Bayes using hand-vetted spam *and* ham, at least 200 of
>>> each. Using autolearn initially can be problematic, so disable that
>>> until SA is doing a fairly good job using hand-trained Bayes. Then you
>>> can let autolearn keep it up-to-date if you like, and continue to
>>> capture and manually train any persistent misses or near-misses.
>>> Generally the more you feed Bayes the better it performs, but it must be
>>> accurately classified. If you feeed garbage to Bayes, you'll get garbage
>>> results.
>>
>> Good to know, thanks. I am running sa-learn --ham --mbox $MAIL now. I've
>> been running sa-learn --spam against the spam messages I've moved to my
>> spam folder, but forgot to teach it about ham.
> 
> It's a really bad idea to train your inbox as ham. There may be stuff
> (specifically, FNs) in there you haven't seen yet or haven't removed.
> Keep a separate train-as-ham folder that you manually populate after
> actually looking at the messages, just like you're keeping a
> train-as-spam folder.
> 
> You might want to wipe and retrain from scratch after setting that up,
> especially if you're seeing low BAYES score hits on spams and FPs.

I can certainly do that.

> Are you seeing any BAYES rule hits at all yet?

Yes, including a fair number of BAYES_999 and BAYES_99, which I would
have thought would have more weight than it apparently does. I know I
can custom score in local.cf, but I've always read that I should avoid
changing default scores unless I *really* know what I'm doing. Clearly,
I'm not there yet.

>>> Keep hand-classified Bayes corpora around in case you ever need to wipe
>>> and retrain from scratch.
>>
>> OK.
>>
>>> Ensure you're training Bayes as the user that SA is running under.
>>> Training the wrong Bayes database is a common cause of problems.
>>
>> It's a small server, so I'm doing this via procmail and spamc.
>> Everything runs in the context of the individual users. I need to run
>> sa-learn --ham as each user against their inboxes, I guess. I can add
>> cron jobs for each user to do that.
> 
> You might also consider running a shared/global Bayes, if all your
> users' mail streams are fairly similar w/r/t "what is ham?" There should
> be instructions in the SA wiki for setting up shared/global Bayes.

I used to run SA via spamass-milter, and use a single Bayes DB under
user spam, but when I downsized my server, the hassle of feeding that
shared DB became bigger than the benefit. I will revisit that conclusion.

>>> Consider doing some MTA-level DNSBL checks. The Zen DNSBL is
>>> well-regarded. If you're using Postfix then there are some emails from
>>> Reindl Harald on this list regarding weighted DNSBL scoring that you may
>>> find useful. You'll have to search the archives to find those.
>>
>> I'm using sendmail, and I have these checks on:
>>
>> FEATURE(`dnsbl',`in.dnsbl.org ')dnl
>> FEATURE(`dnsbl',`sbl-xbl.spamhaus.org')dnl
>> FEATURE(`dnsbl',`cbl.abuseat.org')dnl
>>
>> I will add FEATURE(`dnsbl',`zen.spamhaus.org')dnl to it.
> 
> Zen incorporates a couple of the ones you're already using, don't double
> up.

OK, good to know.

>>> There are some other MTA-level checks you can perform, like greet pause
>>> and HELO validation (e.g. reject if the HELO has no dots).
>>
>> Like this? http://www.harker.com/sendmail/checkhelo.html
> 
> Here's greet pause:
> 
>     FEATURE(`greet_pause',3000)dnl

This is very helpful, thanks!

> I use milter-regex for HELO checks, it's a lot easier than hacking
> sendmail.cf (pokes sigmonster). You might consider milter-regex and take
> a look at this:
> 
>   http://www.impsec.org/~jhardin/antispam/milter-regex.conf
> 
> There are some things in there specific to a very small install, for
> example I expect all mail legitimately from my domain to be coming in
> from localhost so a HELO in my domain on the real IP is always bogus.
> Don't just adopt that config blindly.
> 
>>> Consider greylisting.
>>
>> I am using milter-greylist, and it is very helpful. A lot of these
>> messages are actually skipping greylisting, though!
> 
> Greylisting isn't a panacaea. There *are* spambots who retry, and
> spammers who send through real MTAs. It helps reduce the cheap
> anklebiters, though.
> 
>> X-Greylist: Sender passed SPF test, not delayed by
>> milter-greylist-4.5.16 (XXX [XXX.XXX.XXX.XXX]); Mon, 12 Sep 2016
>> 18:11:18 +0000 (UTC)
> 
> You might not want to bypass greylisting based on SPF. If the sender is
> using a spam domain, they could easily set up "accept from 0.0.0.0/0" in
> that domain's SPF.

Disabled spf passthrough for greylisting, we'll see if it helps.

>> Keep the tips coming, I appreciate learning from you!
>