Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id 55CFC200B95 for ; Tue, 13 Sep 2016 02:02:52 +0200 (CEST) Received: by cust-asf.ponee.io (Postfix) id 54517160AC8; Tue, 13 Sep 2016 00:02:52 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id 74213160AB8 for ; Tue, 13 Sep 2016 02:02:51 +0200 (CEST) Received: (qmail 39732 invoked by uid 500); 13 Sep 2016 00:02:50 -0000 Mailing-List: contact users-help@spamassassin.apache.org; run by ezmlm Precedence: bulk list-help: list-unsubscribe: List-Post: List-Id: Delivered-To: mailing list users@spamassassin.apache.org Received: (qmail 39718 invoked by uid 99); 13 Sep 2016 00:02:49 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd2-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 13 Sep 2016 00:02:49 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd2-us-west.apache.org (ASF Mail Server at spamd2-us-west.apache.org) with ESMTP id 824591A72F0 for ; Tue, 13 Sep 2016 00:02:49 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd2-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 3.674 X-Spam-Level: *** X-Spam-Status: No, score=3.674 tagged_above=-999 required=6.31 tests=[KAM_LAZY_DOMAIN_SECURITY=1, RP_MATCHES_RCVD=-1.426, URIBL_SBL=4, URIBL_SBL_A=0.1] autolearn=disabled Received: from mx2-lw-us.apache.org ([10.40.0.8]) by localhost (spamd2-us-west.apache.org [10.40.0.9]) (amavisd-new, port 10024) with ESMTP id V4yR0u-ZSCYD for ; Tue, 13 Sep 2016 00:02:45 +0000 (UTC) Received: from mail-west.camerontech.com (mail-west.camerontech.com [104.131.155.84]) by mx2-lw-us.apache.org (ASF Mail Server at mx2-lw-us.apache.org) with ESMTPS id 241895FBC9 for ; Tue, 13 Sep 2016 00:02:45 +0000 (UTC) Received: from case.tc.redhat.com (cpe-68-203-23-181.austin.res.rr.com [68.203.23.181]) (authenticated bits=0) by mail-west.camerontech.com (8.14.7/8.14.7) with ESMTP id u8D02ed6008628 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES128-SHA bits=128 verify=NO) for ; Tue, 13 Sep 2016 00:02:44 GMT Subject: Re: Tuning recommendations? To: users@spamassassin.apache.org References: <715ec153-6cf3-9034-eafe-606175b8f03d@camerontech.com> From: thomas cameron Message-ID: <312bec8d-6d46-f755-4edc-ee92028c1eda@camerontech.com> Date: Mon, 12 Sep 2016 19:02:39 -0500 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101 Thunderbird/45.2.0 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-Greylist: Sender succeeded SMTP AUTH, not delayed by milter-greylist-4.5.16 (mail-west.camerontech.com [104.131.155.84]); Tue, 13 Sep 2016 00:02:44 +0000 (UTC) archived-at: Tue, 13 Sep 2016 00:02:52 -0000 On 09/12/2016 02:32 PM, John Hardin wrote: > On Mon, 12 Sep 2016, thomas cameron wrote: > >> On 09/12/2016 01:06 PM, John Hardin wrote: >>> On Mon, 12 Sep 2016, thomas cameron wrote: >>> >>> >>> Make sure you have a local recursing (**NOT** forwarding) DNS server >>> that your MTA and SA are configured to use. Reason: if you're forwarding >>> your MTA DNS requests to your ISP's DNS server, the aggregated traffic >>> of you plus all the other ISP clients can exceed the various DNSBL and >>> URIBL free-usage limits, rendering those tools useless. >> >> [root@mail-west ~]# grep recurs /etc/named.conf >> allow-recursion { 127.0.0.1; }; >> >>> A clear indicator this is happening: URIBL_BLOCKED hits. >> >> I see "URIBL_BLACK Contains an URL listed in the URIBL blacklist" in the >> headers of many of the messages that got through. Is that what you mean? > > No. URIBL_BLACK indicates your URIBL queries are succeeding, that's a > hit. URIBL_BLOCKED means "request blocked", probably due to exceeding > the limits. OK, thanks. >>> Train up your Bayes using hand-vetted spam *and* ham, at least 200 of >>> each. Using autolearn initially can be problematic, so disable that >>> until SA is doing a fairly good job using hand-trained Bayes. Then you >>> can let autolearn keep it up-to-date if you like, and continue to >>> capture and manually train any persistent misses or near-misses. >>> Generally the more you feed Bayes the better it performs, but it must be >>> accurately classified. If you feeed garbage to Bayes, you'll get garbage >>> results. >> >> Good to know, thanks. I am running sa-learn --ham --mbox $MAIL now. I've >> been running sa-learn --spam against the spam messages I've moved to my >> spam folder, but forgot to teach it about ham. > > It's a really bad idea to train your inbox as ham. There may be stuff > (specifically, FNs) in there you haven't seen yet or haven't removed. > Keep a separate train-as-ham folder that you manually populate after > actually looking at the messages, just like you're keeping a > train-as-spam folder. > > You might want to wipe and retrain from scratch after setting that up, > especially if you're seeing low BAYES score hits on spams and FPs. I can certainly do that. > Are you seeing any BAYES rule hits at all yet? Yes, including a fair number of BAYES_999 and BAYES_99, which I would have thought would have more weight than it apparently does. I know I can custom score in local.cf, but I've always read that I should avoid changing default scores unless I *really* know what I'm doing. Clearly, I'm not there yet. >>> Keep hand-classified Bayes corpora around in case you ever need to wipe >>> and retrain from scratch. >> >> OK. >> >>> Ensure you're training Bayes as the user that SA is running under. >>> Training the wrong Bayes database is a common cause of problems. >> >> It's a small server, so I'm doing this via procmail and spamc. >> Everything runs in the context of the individual users. I need to run >> sa-learn --ham as each user against their inboxes, I guess. I can add >> cron jobs for each user to do that. > > You might also consider running a shared/global Bayes, if all your > users' mail streams are fairly similar w/r/t "what is ham?" There should > be instructions in the SA wiki for setting up shared/global Bayes. I used to run SA via spamass-milter, and use a single Bayes DB under user spam, but when I downsized my server, the hassle of feeding that shared DB became bigger than the benefit. I will revisit that conclusion. >>> Consider doing some MTA-level DNSBL checks. The Zen DNSBL is >>> well-regarded. If you're using Postfix then there are some emails from >>> Reindl Harald on this list regarding weighted DNSBL scoring that you may >>> find useful. You'll have to search the archives to find those. >> >> I'm using sendmail, and I have these checks on: >> >> FEATURE(`dnsbl',`in.dnsbl.org ')dnl >> FEATURE(`dnsbl',`sbl-xbl.spamhaus.org')dnl >> FEATURE(`dnsbl',`cbl.abuseat.org')dnl >> >> I will add FEATURE(`dnsbl',`zen.spamhaus.org')dnl to it. > > Zen incorporates a couple of the ones you're already using, don't double > up. OK, good to know. >>> There are some other MTA-level checks you can perform, like greet pause >>> and HELO validation (e.g. reject if the HELO has no dots). >> >> Like this? http://www.harker.com/sendmail/checkhelo.html > > Here's greet pause: > > FEATURE(`greet_pause',3000)dnl This is very helpful, thanks! > I use milter-regex for HELO checks, it's a lot easier than hacking > sendmail.cf (pokes sigmonster). You might consider milter-regex and take > a look at this: > > http://www.impsec.org/~jhardin/antispam/milter-regex.conf > > There are some things in there specific to a very small install, for > example I expect all mail legitimately from my domain to be coming in > from localhost so a HELO in my domain on the real IP is always bogus. > Don't just adopt that config blindly. > >>> Consider greylisting. >> >> I am using milter-greylist, and it is very helpful. A lot of these >> messages are actually skipping greylisting, though! > > Greylisting isn't a panacaea. There *are* spambots who retry, and > spammers who send through real MTAs. It helps reduce the cheap > anklebiters, though. > >> X-Greylist: Sender passed SPF test, not delayed by >> milter-greylist-4.5.16 (XXX [XXX.XXX.XXX.XXX]); Mon, 12 Sep 2016 >> 18:11:18 +0000 (UTC) > > You might not want to bypass greylisting based on SPF. If the sender is > using a spam domain, they could easily set up "accept from 0.0.0.0/0" in > that domain's SPF. Disabled spf passthrough for greylisting, we'll see if it helps. >> Keep the tips coming, I appreciate learning from you! >