Mailing-List: contact users-help@spamassassin.apache.org; run by ezmlm
Precedence: bulk
From: "techlist06" <techlist06@msxc.com>
To: <users@spamassassin.apache.org>
References: 
In-Reply-To: 
Subject: RE: Bayes auto-learn - not happening
Date: Thu, 10 Aug 2017 10:06:47 -0500
Message-ID: <023301d311ea$49935c50$dcba14f0$@com>
MIME-Version: 1.0
Content-Type: text/plain;
	charset="us-ascii"
Content-Transfer-Encoding: quoted-printable
Thread-Index: AdMQcQjiifYQj50aTua1hRsXBnQoQABda9fA
Content-Language: en-us
archived-at: Thu, 10 Aug 2017 15:07:07 -0000

Update:  Still NOT working, but I'm giving it hell trying to figure out =
why :)

First a couple of answers to other's questions:
- John, others, not an ISP, high is relative I'm sure but the volume is =
much higher than I can duplicate and review every flagged message.  =
Right now running at about 10% before I migrate one of my larger =
domains.  Mail is relayed to exchange servers.  Users do not have imap =
accounts on box.  A few local users with POP only.  I don't configure or =
allow anyone to  submit messages for training directly.

- re no, or careful auto-training.  I get it.  I'm migrating from a =
server that's run for years with auto-learn on set at conservative learn =
values.  Never had any trouble with it thank goodness.  As I look at the =
messages that would be autolearned, I've never found one that would have =
learned that should not have in my corpus.  The volume would just be too =
high to personally go through each one of them myself.  I have had =
"problem" users that get a lot of spam misses and I plan to set up a way =
for them to submit their spam to me (not autolearn) for review and =
manual training as needed. =20

- Matus:  re:" autolearn=3Dunavailable apparently due to not accessible =
bayes database [due to permissions]".  I hope you are right.  That would =
make sense to me.  See below please.  I think I listed them all.  Config =
and permissions look good to me, I'm grateful to have anything I missed =
pointed out by an experienced eye.

My old server, running embarrassingly old versions of everything works =
great.  So the auto-learn in general has been a good fit for my =
environment.  I get it that it's not for everyone.  But a tleast it =
SHOULD work, and let me choose to tweak it or turn it off.  As far as I =
can tell it is not working, at all.

So here's where I am:

1.  I stepped back and went through all my configurations carefully.  =
spamassassin is being run via amavisd, as the amavis user.  Site wide =
config, no other users have direct access.  POP accounts and relay =
accounts only.

2.  From prior research before asking for help, I understood no spam was =
necessary for auto-learn to work but one person here said I had to be at =
the minimum (200 default) before it would.  So, to rule that out as the =
issue, I manually fed it plenty of spam and ham.  For others who might =
read this thread archived, I was having trouble getting enough learned =
due to the default size limit my version of SA/sa-learn had.  With some =
digging I found out how to raise that limit and then I had plenty of =
spam to feed:
su amavis -c 'sa-learn -D --spam --showdots --max-size=3D1000000 --mbox =
/home/mail/spam'

[root@mail2 amavisd]# su amavis -c 'sa-learn --dump magic'
0.000          0          3          0  non-token data: bayes db version
0.000          0        349          0  non-token data: nspam
0.000          0        478          0  non-token data: nham
0.000          0     166030          0  non-token data: ntokens
0.000          0 1501594564          0  non-token data: oldest atime
0.000          0 1502289189          0  non-token data: newest atime

3.  Next up were questions about the config and permissions.  I checked =
my setup, it looked OK, but I even opened some directories up 777 for =
testing
This is my config, I'd be grateful if anyone sees anything wrong point =
it out:
I include the amavis stuff just to show it is running and invoked as and =
by amavis user

3a. amavis
in /usr/lib/systemd/system/amavisd.service
User=3Damavis
Group=3Damavis
ExecStart=3D/usr/sbin/amavisd -c /etc/amavisd/amavisd.conf

> amavis user's home dir per /etc/passwd is:
/var/spool/amavisd
verified with cd ~amavis

3b. local.cf
> My spamassassin local.cf is at:
/etc/mail/spamassassin/local.cf

> verified this is the one being used by putting an error=20
> line and restarting amavisd.  It compalins about the error. =20
> Fixed of cousre and continue...

> in local.cf I have these related settings:
use_bayes               1
bayes_auto_learn        1
bayes_auto_learn_threshold_nonspam -1.7
bayes_auto_learn_threshold_spam 10.0
bayes_path              /etc/mail/bayes/bayes
bayes_file_mode         0777

3c. bayes
> for troubleshooting I set the permissions to 777 on /etc/mail/bayes =
and it's files
> This is the only occurrence of the "bayes" files on the server
[root@mail2 amavisd]# ls -la /etc/mail/bayes
total 4196
drwxrwxrwx 2 amavis amavis    4096 Aug  9 13:49 .
drwxr-xr-x 4 amavis amavis    4096 Aug  3 13:02 ..
-rwxrwxrwx 1 amavis amavis   86016 Aug  9 09:51 bayes_seen
-rwxrwxrwx 1 amavis amavis 5246976 Aug  9 13:49 bayes_toks

3d. amavis spamassassin folder settings=20
> For amavis which is calling spamassassin via it's=20
> perl libraries (I am not running spamd),
> I have it's related configuration parts as:
$MYHOME =3D '/var/spool/amavisd';   # a convenient default for other =
settings, -H
$TEMPBASE =3D "$MYHOME/tmp";   # working directory, needs to exist, -T
$ENV{TMPDIR} =3D $TEMPBASE;    # environment variable TMPDIR, used by =
SA, etc.
$db_home   =3D "$MYHOME/db";        # dir for bdb nanny/cache/snmp =
databases, -D
#$helpers_home =3D "$MYHOME/var";  # working directory for SpamAssassin, =
-S
$helpers_home =3D "$MYHOME";  # working directory for SpamAssassin, -S

3e. spamassassin directory
> And for spamassassin, it's files are being placed in the amavisd home =
directory as configured in amavisd.conf.
> I am careful to only run sa-update, or SA debug commands as amavisd =
user so as not to create any other
> .spamassassin folders under root, etc.
> this is the only occurrence of .spamassassin on the server:
[root@mail2 amavisd]# locate .spamassassin
/var/spool/amavisd/.spamassassin
/var/spool/amavisd/.spamassassin/user_prefs

3f. amavis (spamassassin's user) home directory
[root@mail2 amavisd]# ls -la /var/spool/amavisd
total 32
drwxr-x--- 6 amavis amavis 4096 Aug  9 20:49 .
drwxr-xr-x 8 root   root   4096 Nov  5  2016 ..
-rw------- 1 amavis amavis  101 Aug  9 11:17 .bash_history
-rw-r--r-- 1 amavis amavis    0 Aug  9 20:49 black.lst
drwxr-x--- 2 amavis amavis 4096 Aug  9 20:30 db
drwxr-x--- 2 amavis amavis 4096 Apr 19 07:28 quarantine
drwx------ 2 amavis amavis 4096 Aug  8 15:32 .spamassassin
drwxr-x--- 5 amavis amavis 4096 Aug 10 08:26 tmp
-rw-r--r-- 1 amavis amavis   37 Aug  7 19:28 white.lst

3g.  .spamassassin folder
[root@mail2 amavisd]# ls -la /var/spool/amavisd/.spamassassin
total 12
drwx------ 2 amavis amavis 4096 Aug  8 15:32 .
drwxr-x--- 6 amavis amavis 4096 Aug  9 20:49 ..
-rw-r--r-- 1 amavis amavis 1869 Aug  8 15:32 user_prefs


4. Logging
I managed to get Amavisd configured to let the more verbose rule listing =
for the header, and score details in the log come through for my =
troubleshooting as well.

5, results:

After running this config now, with a loaded bayes database, it has yet =
to auto-learn a single spam (or ham).  Just through yesterday my spam =
quarantine has over 50 pretty high scoring spams in it.  I've studied =
tflags and now understand what they are (for others here's a good link):
http://commons.oreilly.com/wiki/index.php/SpamAssassin/SpamAssassin_Rules=


I understand SA requires at least 3 points from the header and 3 points =
from the body, to auto-learn as spam.  I understand some tflags preclude =
the use of the test in the autolearn score.  I understand bayes points =
don't count.  But surely one of the 50 high scores I caught yesterday =
qualified.  Yet, no autolearn.  Always autolearn=3Dunavailable or no.  =
I've turned on verbose debugging for bayes but I don't see any errors or =
feedback on reasons for the no-learn.

Looked at yesterday's log:

cat /var/log/maillog.1|grep autolearn=3Dunavailable|wc -l
60

Now amavisd has the option of giving a verbose log line with all the =
score stuff.  Now amavis adds a "autolearn score" to the log as well.  =
Not sure how that is calculated, but it's interesting anyway.  Be great =
if it were h/b/t (header/body/total).  Anyway, sample:

Aug 10 00:38:39 mail2 amavis[15959]: (15959-08) Blocked SPAM =
{DiscardedInbound,Quarantined}, [89.43.62.101]:47955 [89.43.62.101] =
ESMTP/LMTP <contact@hewis.versateye.com> -> <shorton@myvirt.org>, =
(ESMTP://[89.43.62.101]:47955), quarantine: spam06@myvirt.org, Queue-ID: =
7F64A70, mail_id: yxtV5c7b1N8r, b: tDtWV84sR, Hits: 23.553, size: =
365419, Subject: "Joanna Gaines Drops Bombshell.", From: =
<contact@hewis.versateye.com>, helo=3Dhewis.versateye.com, Tests: =
[BAYES_999=3D0.2,BAYES_99=3D3.5,DATE_IN_PAST_03_06=3D1.592,DCC_CHECK=3D3.=
2,DIGEST_MULTIPLE=3D0.293,HTML_MESSAGE=3D0.001,HTML_MIME_NO_HTML_TAG=3D0.=
377,MIME_HTML_ONLY=3D0.723,MISSING_MID=3D0.497,NORMAL_HTTP_TO_IP=3D0.001,=
RAZOR2_CF_RANGE_51_100=3D0.5,RAZOR2_CF_RANGE_E8_51_100=3D1.886,RAZOR2_CHE=
CK=3D2.5,RCVD_IN_BRBL_LASTEXT=3D1.449,RDNS_NONE=3D0.793,SPF_HELO_PASS=3D-=
0.001,SPF_PASS=3D-0.001,STYLE_GIBBERISH=3D3.093,URIBL_ABUSE_SURBL=3D1.25,=
URIBL_BLACK=3D1.7], autolearn=3Dunavailable autolearn_force=3Dno, =
autolearnscore=3D21.113, 5061 ms

As usual, autolearn=3Dunavailable. =20

My suspicion is many of those "unavailable" should have been a learn.  =
Surely out of 60, one was valid to autolearn.=20

I don't know what to look for next to troubleshoot.  Sure hoping it's =
just a permissions issue.

I'm back to a brick wall.  How can I help you help me? =20