spamassassin-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Don Levey" <spamassas...@the-leveys.us>
Subject RE: Autolearn=failed when BAYES_00 is only rule hit
Date Fri, 01 Apr 2005 16:00:52 GMT
Don Levey wrote:
> Please forgive me if this is in the archives; I'm having trouble
> finding it.
>
> I've just finished training my Bayes DB using sa-learn (perversely,
> when I was trying to collect 200 spam messages, the spammers decided
> to stop sending to me).  Now that the DB is usable, it's interesting
> that while most ham messages produce at least one small rule hit and
> a negative Bayes score that results in "Autolearn=no", when BAYES_00
> is the ONLY rule that hits I get "Autolearn=failed".
>
> Two quick questions:
> 1) What should I do about this, and
> 2) Should I worry, or just ignore it?
>
> TIA,
>  -Don

I may have found at least part of the problem, at least as far as the
"autolearn=no" portion of the question.  Running a message through
"spamassassin -D --mbox < msgfile" gives me the following last few lines:

debug: running body-text per-line regexp tests; score so far=8.886
debug: running uri tests; score so far=8.886
debug: running raw-body-text per-line regexp tests; score so far=8.886
debug: running full-text regexp tests; score so far=8.886
debug: auto-learn: currently using scoreset 3, recomputing score based on
scoreset 1.
debug: auto-learn: message score: 8.886, computed score for autolearn: 7.223
debug: auto-learn? ham=0.1, spam=12, body-points=3.1, head-points=3.64,
learned-points=-1.096
debug: auto-learn? no: inside auto-learn thresholds, not considered ham or
spam
debug: is spam? score=8.886 required=5
debug:
tests=BAYES_40,DATE_IN_FUTURE_03_06,FORGED_YAHOO_RCVD,MIME_HEADER_CTYPE_ONLY
,NO_OBLIGATION,SUBJ_LIFE_INSURANCE,URIBL_OB_SURBL,URIBL_WS_SURBL
debug:
subtests=__BAT_BOUNDARY,__CT,__CTYPE_HAS_BOUNDARY,__HAS_MSGID,__HAS_SUBJECT,
__MSGID_OK_DIGITS,__MSGID_OK_HEX,__MSGID_OK_HOST,__RCVD_IN_NJABL,__RCVD_IN_S
O
RBS,__RFC_IGNORANT_ENVFROM,__SANE_MSGID


So somewhere I've got set that in order to autolearn as spam, I must have a
score of 12, and to learn as ham the score must be less than 0.1.  This
particular message scored 11.9.

The next step was to try a message that had a score greater than 12.  I saw
that on the example I chose, I also got "autolearn=failed" in the header.
Running the same debug command line, I got:

debug: running body-text per-line regexp tests; score so far=15.837
debug: running uri tests; score so far=15.837
debug: running raw-body-text per-line regexp tests; score so far=15.837
debug: running full-text regexp tests; score so far=15.837
debug: auto-learn: currently using scoreset 3, recomputing score based on
scoreset 1.
debug: auto-learn: message score: 15.837, computed score for autolearn:
13.387
debug: auto-learn? ham=0.1, spam=12, body-points=11.404, head-points=5.843,
learned-points=0.001
debug: auto-learn? yes, spam (13.387 > 12)
debug: Learning Spam
<debug tokenizing messages removed for brevity>
debug: bayes: 20664 untie-ing
debug: bayes: 20664 untie-ing db_toks
debug: bayes: 20664 untie-ing db_seen
debug: bayes: files locked, now unlocking lock
debug: unlock: 20664 unlink /etc/mail/spamassassin/bayes_db.lock
debug: is spam? score=15.837 required=5
debug:
tests=BAYES_50,FORGED_YAHOO_RCVD,MIME_HEADER_CTYPE_ONLY,RCVD_IN_BL_SPAMCOP_N
ET,RCVD_IN_XBL,URIBL_OB_SURBL,URIBL_SBL,URIBL_SC_SURBL,URIBL_WS_SURBL
debug:
subtests=__BAT_BOUNDARY,__CT,__CTYPE_HAS_BOUNDARY,__HAS_MSGID,__HAS_SUBJECT,
__MSGID_OK_HOST,__RCVD_IN_SBL_XBL,__RFC_IGNORANT_ENVFROM,__SANE_MSGID

As should be clear here, it says that the message WAS autolearned.  And I
see that in the message headers generated from this run, I did get
"autolearn=spam".  I am doing this as the same user as is running spamd
(platform is Fedora, where the spamassassin "service" run is spamd).

I had been hoping to get debug messages from the above, but everything was
fine.  Checking in my maillog, however, hit a bit of paydirt:

Apr  1 09:40:01 davinci spamd[9864]: connection from davinci.example.com
[127.0.0.1] at port 41609
Apr  1 09:40:01 davinci spamd[9864]: info: setuid to root succeeded
Apr  1 09:40:01 davinci spamd[9864]: Still running as root: user not
specified with -u, not found, or set to root.  Fall back to nobody.
Apr  1 09:40:01 davinci spamd[9864]: processing message
<1112366187.5824.28.camel@localhost.localdomain> for root:99.
Apr  1 09:40:01 davinci spamd[9864]: bayes expire_old_tokens: lock: 9864
cannot create tmp lockfile
/etc/mail/spamassassin/bayes_db.lock.davinci.example.com.9864 for
/etc/mail/spamassassin/bayes_db.lock: Permission denied
Apr  1 09:40:01 davinci spamd[9864]: cannot write to
/etc/mail/spamassassin/bayes_db_journal, Bayes db update ignored: Permission
denied
Apr  1 09:40:07 davinci spamd[9864]: clean message (-4.9/5.0) for root:99 in
6.1 seconds, 3079 bytes.
Apr  1 09:40:07 davinci spamd[9864]: result: . -4 - BAYES_00
scantime=6.1,size=3079,mid=<1112366187.5824.28.camel@localhost.localdomain>,
bayes=0,autolearn=failed


Note that I am getting a permissions error creating the lock file.  This
seems to be because the permissions on the /etc/mail/spamassassin directory
do not permit the user 'spamd' to write the lock file.  I've at least
temporarily fixed this while I sort out the user ID situation, but now I'm
autolearning.

Why am I telling you all of this?  Because someone you know may be in a
similar situation, or *you* may be in a similar situation.  This at least
gets the info in the archives (perhaps again) so that it may be found.

Thanks for your time,
 -Don

Mime
View raw message