spamassassin-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "techlist06" <techlis...@msxc.com>
Subject RE: Bayes auto-learn - not happening
Date Thu, 10 Aug 2017 15:06:47 GMT
Update:  Still NOT working, but I'm giving it hell trying to figure out why :)

First a couple of answers to other's questions:
- John, others, not an ISP, high is relative I'm sure but the volume is much higher than I
can duplicate and review every flagged message.  Right now running at about 10% before I migrate
one of my larger domains.  Mail is relayed to exchange servers.  Users do not have imap accounts
on box.  A few local users with POP only.  I don't configure or allow anyone to  submit messages
for training directly.

- re no, or careful auto-training.  I get it.  I'm migrating from a server that's run for
years with auto-learn on set at conservative learn values.  Never had any trouble with it
thank goodness.  As I look at the messages that would be autolearned, I've never found one
that would have learned that should not have in my corpus.  The volume would just be too high
to personally go through each one of them myself.  I have had "problem" users that get a lot
of spam misses and I plan to set up a way for them to submit their spam to me (not autolearn)
for review and manual training as needed.  

- Matus:  re:" autolearn=unavailable apparently due to not accessible bayes database [due
to permissions]".  I hope you are right.  That would make sense to me.  See below please.
 I think I listed them all.  Config and permissions look good to me, I'm grateful to have
anything I missed pointed out by an experienced eye.

My old server, running embarrassingly old versions of everything works great.  So the auto-learn
in general has been a good fit for my environment.  I get it that it's not for everyone. 
But a tleast it SHOULD work, and let me choose to tweak it or turn it off.  As far as I can
tell it is not working, at all.

So here's where I am:

1.  I stepped back and went through all my configurations carefully.  spamassassin is being
run via amavisd, as the amavis user.  Site wide config, no other users have direct access.
 POP accounts and relay accounts only.

2.  From prior research before asking for help, I understood no spam was necessary for auto-learn
to work but one person here said I had to be at the minimum (200 default) before it would.
 So, to rule that out as the issue, I manually fed it plenty of spam and ham.  For others
who might read this thread archived, I was having trouble getting enough learned due to the
default size limit my version of SA/sa-learn had.  With some digging I found out how to raise
that limit and then I had plenty of spam to feed:
su amavis -c 'sa-learn -D --spam --showdots --max-size=1000000 --mbox /home/mail/spam'

[root@mail2 amavisd]# su amavis -c 'sa-learn --dump magic'
0.000          0          3          0  non-token data: bayes db version
0.000          0        349          0  non-token data: nspam
0.000          0        478          0  non-token data: nham
0.000          0     166030          0  non-token data: ntokens
0.000          0 1501594564          0  non-token data: oldest atime
0.000          0 1502289189          0  non-token data: newest atime

3.  Next up were questions about the config and permissions.  I checked my setup, it looked
OK, but I even opened some directories up 777 for testing
This is my config, I'd be grateful if anyone sees anything wrong point it out:
I include the amavis stuff just to show it is running and invoked as and by amavis user

3a. amavis
in /usr/lib/systemd/system/amavisd.service
User=amavis
Group=amavis
ExecStart=/usr/sbin/amavisd -c /etc/amavisd/amavisd.conf

> amavis user's home dir per /etc/passwd is:
/var/spool/amavisd
verified with cd ~amavis

3b. local.cf
> My spamassassin local.cf is at:
/etc/mail/spamassassin/local.cf

> verified this is the one being used by putting an error 
> line and restarting amavisd.  It compalins about the error.  
> Fixed of cousre and continue...

> in local.cf I have these related settings:
use_bayes               1
bayes_auto_learn        1
bayes_auto_learn_threshold_nonspam -1.7
bayes_auto_learn_threshold_spam 10.0
bayes_path              /etc/mail/bayes/bayes
bayes_file_mode         0777

3c. bayes
> for troubleshooting I set the permissions to 777 on /etc/mail/bayes and it's files
> This is the only occurrence of the "bayes" files on the server
[root@mail2 amavisd]# ls -la /etc/mail/bayes
total 4196
drwxrwxrwx 2 amavis amavis    4096 Aug  9 13:49 .
drwxr-xr-x 4 amavis amavis    4096 Aug  3 13:02 ..
-rwxrwxrwx 1 amavis amavis   86016 Aug  9 09:51 bayes_seen
-rwxrwxrwx 1 amavis amavis 5246976 Aug  9 13:49 bayes_toks

3d. amavis spamassassin folder settings 
> For amavis which is calling spamassassin via it's 
> perl libraries (I am not running spamd),
> I have it's related configuration parts as:
$MYHOME = '/var/spool/amavisd';   # a convenient default for other settings, -H
$TEMPBASE = "$MYHOME/tmp";   # working directory, needs to exist, -T
$ENV{TMPDIR} = $TEMPBASE;    # environment variable TMPDIR, used by SA, etc.
$db_home   = "$MYHOME/db";        # dir for bdb nanny/cache/snmp databases, -D
#$helpers_home = "$MYHOME/var";  # working directory for SpamAssassin, -S
$helpers_home = "$MYHOME";  # working directory for SpamAssassin, -S

3e. spamassassin directory
> And for spamassassin, it's files are being placed in the amavisd home directory as configured
in amavisd.conf.
> I am careful to only run sa-update, or SA debug commands as amavisd user so as not to
create any other
> .spamassassin folders under root, etc.
> this is the only occurrence of .spamassassin on the server:
[root@mail2 amavisd]# locate .spamassassin
/var/spool/amavisd/.spamassassin
/var/spool/amavisd/.spamassassin/user_prefs

3f. amavis (spamassassin's user) home directory
[root@mail2 amavisd]# ls -la /var/spool/amavisd
total 32
drwxr-x--- 6 amavis amavis 4096 Aug  9 20:49 .
drwxr-xr-x 8 root   root   4096 Nov  5  2016 ..
-rw------- 1 amavis amavis  101 Aug  9 11:17 .bash_history
-rw-r--r-- 1 amavis amavis    0 Aug  9 20:49 black.lst
drwxr-x--- 2 amavis amavis 4096 Aug  9 20:30 db
drwxr-x--- 2 amavis amavis 4096 Apr 19 07:28 quarantine
drwx------ 2 amavis amavis 4096 Aug  8 15:32 .spamassassin
drwxr-x--- 5 amavis amavis 4096 Aug 10 08:26 tmp
-rw-r--r-- 1 amavis amavis   37 Aug  7 19:28 white.lst

3g.  .spamassassin folder
[root@mail2 amavisd]# ls -la /var/spool/amavisd/.spamassassin
total 12
drwx------ 2 amavis amavis 4096 Aug  8 15:32 .
drwxr-x--- 6 amavis amavis 4096 Aug  9 20:49 ..
-rw-r--r-- 1 amavis amavis 1869 Aug  8 15:32 user_prefs


4. Logging
I managed to get Amavisd configured to let the more verbose rule listing for the header, and
score details in the log come through for my troubleshooting as well.

5, results:

After running this config now, with a loaded bayes database, it has yet to auto-learn a single
spam (or ham).  Just through yesterday my spam quarantine has over 50 pretty high scoring
spams in it.  I've studied tflags and now understand what they are (for others here's a good
link):
http://commons.oreilly.com/wiki/index.php/SpamAssassin/SpamAssassin_Rules

I understand SA requires at least 3 points from the header and 3 points from the body, to
auto-learn as spam.  I understand some tflags preclude the use of the test in the autolearn
score.  I understand bayes points don't count.  But surely one of the 50 high scores I caught
yesterday qualified.  Yet, no autolearn.  Always autolearn=unavailable or no.  I've turned
on verbose debugging for bayes but I don't see any errors or feedback on reasons for the no-learn.

Looked at yesterday's log:

cat /var/log/maillog.1|grep autolearn=unavailable|wc -l
60

Now amavisd has the option of giving a verbose log line with all the score stuff.  Now amavis
adds a "autolearn score" to the log as well.  Not sure how that is calculated, but it's interesting
anyway.  Be great if it were h/b/t (header/body/total).  Anyway, sample:

Aug 10 00:38:39 mail2 amavis[15959]: (15959-08) Blocked SPAM {DiscardedInbound,Quarantined},
[89.43.62.101]:47955 [89.43.62.101] ESMTP/LMTP <contact@hewis.versateye.com> -> <shorton@myvirt.org>,
(ESMTP://[89.43.62.101]:47955), quarantine: spam06@myvirt.org, Queue-ID: 7F64A70, mail_id:
yxtV5c7b1N8r, b: tDtWV84sR, Hits: 23.553, size: 365419, Subject: "Joanna Gaines Drops Bombshell.",
From: <contact@hewis.versateye.com>, helo=hewis.versateye.com, Tests: [BAYES_999=0.2,BAYES_99=3.5,DATE_IN_PAST_03_06=1.592,DCC_CHECK=3.2,DIGEST_MULTIPLE=0.293,HTML_MESSAGE=0.001,HTML_MIME_NO_HTML_TAG=0.377,MIME_HTML_ONLY=0.723,MISSING_MID=0.497,NORMAL_HTTP_TO_IP=0.001,RAZOR2_CF_RANGE_51_100=0.5,RAZOR2_CF_RANGE_E8_51_100=1.886,RAZOR2_CHECK=2.5,RCVD_IN_BRBL_LASTEXT=1.449,RDNS_NONE=0.793,SPF_HELO_PASS=-0.001,SPF_PASS=-0.001,STYLE_GIBBERISH=3.093,URIBL_ABUSE_SURBL=1.25,URIBL_BLACK=1.7],
autolearn=unavailable autolearn_force=no, autolearnscore=21.113, 5061 ms

As usual, autolearn=unavailable.  

My suspicion is many of those "unavailable" should have been a learn.  Surely out of 60, one
was valid to autolearn. 

I don't know what to look for next to troubleshoot.  Sure hoping it's just a permissions issue.

I'm back to a brick wall.  How can I help you help me?  



Mime
View raw message