Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id 2167620049E for ; Thu, 10 Aug 2017 17:07:07 +0200 (CEST) Received: by cust-asf.ponee.io (Postfix) id 1FF2516B826; Thu, 10 Aug 2017 15:07:07 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id 3E98416B81D for ; Thu, 10 Aug 2017 17:07:06 +0200 (CEST) Received: (qmail 35182 invoked by uid 500); 10 Aug 2017 15:07:05 -0000 Mailing-List: contact users-help@spamassassin.apache.org; run by ezmlm Precedence: bulk list-help: list-unsubscribe: List-Post: List-Id: Delivered-To: mailing list users@spamassassin.apache.org Received: (qmail 35171 invoked by uid 99); 10 Aug 2017 15:07:04 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd4-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 10 Aug 2017 15:07:04 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd4-us-west.apache.org (ASF Mail Server at spamd4-us-west.apache.org) with ESMTP id 50334C0042 for ; Thu, 10 Aug 2017 15:07:04 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd4-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: -1.3 X-Spam-Level: X-Spam-Status: No, score=-1.3 tagged_above=-999 required=6.31 tests=[KAM_LAZY_DOMAIN_SECURITY=1, RCVD_IN_DNSWL_MED=-2.3] autolearn=disabled Received: from mx1-lw-eu.apache.org ([10.40.0.8]) by localhost (spamd4-us-west.apache.org [10.40.0.11]) (amavisd-new, port 10024) with ESMTP id BEeyfCtab-1S for ; Thu, 10 Aug 2017 15:07:00 +0000 (UTC) Received: from tn1.companypostoffice.com (tn1.companypostoffice.com [69.46.24.134]) by mx1-lw-eu.apache.org (ASF Mail Server at mx1-lw-eu.apache.org) with ESMTP id 7A8605FBC6 for ; Thu, 10 Aug 2017 15:06:59 +0000 (UTC) Received: from localhost (server [127.0.0.1]) by tn1.companypostoffice.com (Postfix) with ESMTP id DE8B747C060 for ; Thu, 10 Aug 2017 10:06:52 -0500 (CDT) X-Virus-Scanned: amavisd-new at companypostoffice.com Received: from tn1.companypostoffice.com ([127.0.0.1]) by localhost (tn1.companypostoffice.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id uFDgUn-7xqqa for ; Thu, 10 Aug 2017 10:06:50 -0500 (CDT) Received: from HDPLEX2 (108-222-197-75.lightspeed.nsvltn.sbcglobal.net [108.222.197.75]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by tn1.companypostoffice.com (Postfix) with ESMTP id 285B747C05F for ; Thu, 10 Aug 2017 10:06:50 -0500 (CDT) From: "techlist06" To: References: In-Reply-To: Subject: RE: Bayes auto-learn - not happening Date: Thu, 10 Aug 2017 10:06:47 -0500 Message-ID: <023301d311ea$49935c50$dcba14f0$@com> MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable X-Mailer: Microsoft Office Outlook 12.0 Thread-Index: AdMQcQjiifYQj50aTua1hRsXBnQoQABda9fA Content-Language: en-us archived-at: Thu, 10 Aug 2017 15:07:07 -0000 Update: Still NOT working, but I'm giving it hell trying to figure out = why :) First a couple of answers to other's questions: - John, others, not an ISP, high is relative I'm sure but the volume is = much higher than I can duplicate and review every flagged message. = Right now running at about 10% before I migrate one of my larger = domains. Mail is relayed to exchange servers. Users do not have imap = accounts on box. A few local users with POP only. I don't configure or = allow anyone to submit messages for training directly. - re no, or careful auto-training. I get it. I'm migrating from a = server that's run for years with auto-learn on set at conservative learn = values. Never had any trouble with it thank goodness. As I look at the = messages that would be autolearned, I've never found one that would have = learned that should not have in my corpus. The volume would just be too = high to personally go through each one of them myself. I have had = "problem" users that get a lot of spam misses and I plan to set up a way = for them to submit their spam to me (not autolearn) for review and = manual training as needed. =20 - Matus: re:" autolearn=3Dunavailable apparently due to not accessible = bayes database [due to permissions]". I hope you are right. That would = make sense to me. See below please. I think I listed them all. Config = and permissions look good to me, I'm grateful to have anything I missed = pointed out by an experienced eye. My old server, running embarrassingly old versions of everything works = great. So the auto-learn in general has been a good fit for my = environment. I get it that it's not for everyone. But a tleast it = SHOULD work, and let me choose to tweak it or turn it off. As far as I = can tell it is not working, at all. So here's where I am: 1. I stepped back and went through all my configurations carefully. = spamassassin is being run via amavisd, as the amavis user. Site wide = config, no other users have direct access. POP accounts and relay = accounts only. 2. From prior research before asking for help, I understood no spam was = necessary for auto-learn to work but one person here said I had to be at = the minimum (200 default) before it would. So, to rule that out as the = issue, I manually fed it plenty of spam and ham. For others who might = read this thread archived, I was having trouble getting enough learned = due to the default size limit my version of SA/sa-learn had. With some = digging I found out how to raise that limit and then I had plenty of = spam to feed: su amavis -c 'sa-learn -D --spam --showdots --max-size=3D1000000 --mbox = /home/mail/spam' [root@mail2 amavisd]# su amavis -c 'sa-learn --dump magic' 0.000 0 3 0 non-token data: bayes db version 0.000 0 349 0 non-token data: nspam 0.000 0 478 0 non-token data: nham 0.000 0 166030 0 non-token data: ntokens 0.000 0 1501594564 0 non-token data: oldest atime 0.000 0 1502289189 0 non-token data: newest atime 3. Next up were questions about the config and permissions. I checked = my setup, it looked OK, but I even opened some directories up 777 for = testing This is my config, I'd be grateful if anyone sees anything wrong point = it out: I include the amavis stuff just to show it is running and invoked as and = by amavis user 3a. amavis in /usr/lib/systemd/system/amavisd.service User=3Damavis Group=3Damavis ExecStart=3D/usr/sbin/amavisd -c /etc/amavisd/amavisd.conf > amavis user's home dir per /etc/passwd is: /var/spool/amavisd verified with cd ~amavis 3b. local.cf > My spamassassin local.cf is at: /etc/mail/spamassassin/local.cf > verified this is the one being used by putting an error=20 > line and restarting amavisd. It compalins about the error. =20 > Fixed of cousre and continue... > in local.cf I have these related settings: use_bayes 1 bayes_auto_learn 1 bayes_auto_learn_threshold_nonspam -1.7 bayes_auto_learn_threshold_spam 10.0 bayes_path /etc/mail/bayes/bayes bayes_file_mode 0777 3c. bayes > for troubleshooting I set the permissions to 777 on /etc/mail/bayes = and it's files > This is the only occurrence of the "bayes" files on the server [root@mail2 amavisd]# ls -la /etc/mail/bayes total 4196 drwxrwxrwx 2 amavis amavis 4096 Aug 9 13:49 . drwxr-xr-x 4 amavis amavis 4096 Aug 3 13:02 .. -rwxrwxrwx 1 amavis amavis 86016 Aug 9 09:51 bayes_seen -rwxrwxrwx 1 amavis amavis 5246976 Aug 9 13:49 bayes_toks 3d. amavis spamassassin folder settings=20 > For amavis which is calling spamassassin via it's=20 > perl libraries (I am not running spamd), > I have it's related configuration parts as: $MYHOME =3D '/var/spool/amavisd'; # a convenient default for other = settings, -H $TEMPBASE =3D "$MYHOME/tmp"; # working directory, needs to exist, -T $ENV{TMPDIR} =3D $TEMPBASE; # environment variable TMPDIR, used by = SA, etc. $db_home =3D "$MYHOME/db"; # dir for bdb nanny/cache/snmp = databases, -D #$helpers_home =3D "$MYHOME/var"; # working directory for SpamAssassin, = -S $helpers_home =3D "$MYHOME"; # working directory for SpamAssassin, -S 3e. spamassassin directory > And for spamassassin, it's files are being placed in the amavisd home = directory as configured in amavisd.conf. > I am careful to only run sa-update, or SA debug commands as amavisd = user so as not to create any other > .spamassassin folders under root, etc. > this is the only occurrence of .spamassassin on the server: [root@mail2 amavisd]# locate .spamassassin /var/spool/amavisd/.spamassassin /var/spool/amavisd/.spamassassin/user_prefs 3f. amavis (spamassassin's user) home directory [root@mail2 amavisd]# ls -la /var/spool/amavisd total 32 drwxr-x--- 6 amavis amavis 4096 Aug 9 20:49 . drwxr-xr-x 8 root root 4096 Nov 5 2016 .. -rw------- 1 amavis amavis 101 Aug 9 11:17 .bash_history -rw-r--r-- 1 amavis amavis 0 Aug 9 20:49 black.lst drwxr-x--- 2 amavis amavis 4096 Aug 9 20:30 db drwxr-x--- 2 amavis amavis 4096 Apr 19 07:28 quarantine drwx------ 2 amavis amavis 4096 Aug 8 15:32 .spamassassin drwxr-x--- 5 amavis amavis 4096 Aug 10 08:26 tmp -rw-r--r-- 1 amavis amavis 37 Aug 7 19:28 white.lst 3g. .spamassassin folder [root@mail2 amavisd]# ls -la /var/spool/amavisd/.spamassassin total 12 drwx------ 2 amavis amavis 4096 Aug 8 15:32 . drwxr-x--- 6 amavis amavis 4096 Aug 9 20:49 .. -rw-r--r-- 1 amavis amavis 1869 Aug 8 15:32 user_prefs 4. Logging I managed to get Amavisd configured to let the more verbose rule listing = for the header, and score details in the log come through for my = troubleshooting as well. 5, results: After running this config now, with a loaded bayes database, it has yet = to auto-learn a single spam (or ham). Just through yesterday my spam = quarantine has over 50 pretty high scoring spams in it. I've studied = tflags and now understand what they are (for others here's a good link): http://commons.oreilly.com/wiki/index.php/SpamAssassin/SpamAssassin_Rules= I understand SA requires at least 3 points from the header and 3 points = from the body, to auto-learn as spam. I understand some tflags preclude = the use of the test in the autolearn score. I understand bayes points = don't count. But surely one of the 50 high scores I caught yesterday = qualified. Yet, no autolearn. Always autolearn=3Dunavailable or no. = I've turned on verbose debugging for bayes but I don't see any errors or = feedback on reasons for the no-learn. Looked at yesterday's log: cat /var/log/maillog.1|grep autolearn=3Dunavailable|wc -l 60 Now amavisd has the option of giving a verbose log line with all the = score stuff. Now amavis adds a "autolearn score" to the log as well. = Not sure how that is calculated, but it's interesting anyway. Be great = if it were h/b/t (header/body/total). Anyway, sample: Aug 10 00:38:39 mail2 amavis[15959]: (15959-08) Blocked SPAM = {DiscardedInbound,Quarantined}, [89.43.62.101]:47955 [89.43.62.101] = ESMTP/LMTP -> , = (ESMTP://[89.43.62.101]:47955), quarantine: spam06@myvirt.org, Queue-ID: = 7F64A70, mail_id: yxtV5c7b1N8r, b: tDtWV84sR, Hits: 23.553, size: = 365419, Subject: "Joanna Gaines Drops Bombshell.", From: = , helo=3Dhewis.versateye.com, Tests: = [BAYES_999=3D0.2,BAYES_99=3D3.5,DATE_IN_PAST_03_06=3D1.592,DCC_CHECK=3D3.= 2,DIGEST_MULTIPLE=3D0.293,HTML_MESSAGE=3D0.001,HTML_MIME_NO_HTML_TAG=3D0.= 377,MIME_HTML_ONLY=3D0.723,MISSING_MID=3D0.497,NORMAL_HTTP_TO_IP=3D0.001,= RAZOR2_CF_RANGE_51_100=3D0.5,RAZOR2_CF_RANGE_E8_51_100=3D1.886,RAZOR2_CHE= CK=3D2.5,RCVD_IN_BRBL_LASTEXT=3D1.449,RDNS_NONE=3D0.793,SPF_HELO_PASS=3D-= 0.001,SPF_PASS=3D-0.001,STYLE_GIBBERISH=3D3.093,URIBL_ABUSE_SURBL=3D1.25,= URIBL_BLACK=3D1.7], autolearn=3Dunavailable autolearn_force=3Dno, = autolearnscore=3D21.113, 5061 ms As usual, autolearn=3Dunavailable. =20 My suspicion is many of those "unavailable" should have been a learn. = Surely out of 60, one was valid to autolearn.=20 I don't know what to look for next to troubleshoot. Sure hoping it's = just a permissions issue. I'm back to a brick wall. How can I help you help me? =20