spamassassin-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Reindl Harald <h.rei...@thelounge.net>
Subject Re: newbie questions: sought, sa-learn, rule weights
Date Sun, 18 Oct 2015 09:29:16 GMT


Am 18.10.2015 um 06:35 schrieb frederik@ofb.net:
> I'm concerned that the BAYES_* rules aren't showing up in my spam
> headers

you pretty sure train the wrong bayes instead the one of the user SA is 
running

> and would like to know if there's a good way to look at the
> tokens in the database

there is no way at all, stripped hashes

> When I do "sa-learn --dump data", I see a file
> with lines like this:
>
> 0.987          1          0 1436496897  0315e1da7f
> 0.016          0          1 1410284743  0320ba06ef
> 0.987          1          0 1393199297  0329ec4e6e
> 0.003          0          5 1268403253  03541effbc
> 0.008          0          2 1398222936  038d6e997d
> 0.016          0          1 1429567309  041cabf4ef
> 0.016          0          1 1431638107  041d441c1b
>
> Is that normal?

yes

> How do I get at the actual tokens?

you don't

> How do I see how it scores a test message, just the Bayesian part?

you see BAYES_00 - BAYES_999 in the mailheaders

> I find that I get a lot
> of spam with exactly the same lines in the body of the message, and
> the Bayesian classifier doesn't seem to register it.

as said above: you train the wrong bayes

> Here's the output of sa-learn --dump magic:
>
> 0.000          0          3          0  non-token data: bayes db version
> 0.000          0      15466          0  non-token data: nspam
> 0.000          0      30317          0  non-token data: nham
> 0.000          0    1733267          0  non-token data: ntokens
> 0.000          0 1098575745          0  non-token data: oldest atime
> 0.000          0 1441160002          0  non-token data: newest atime
> 0.000          0          0          0  non-token data: last journal sync atime
> 0.000          0 1441160455          0  non-token data: last expiry atime
> 0.000          0          0          0  non-token data: last expire atime delta
> 0.000          0          0          0  non-token data: last expire reduction count

FROM WHAT USER?

> I couldn't find a sample output on your Wiki, with which to compare
> this; I'm worried about the 0.000 lines and other zeroes.

they are normal

> I'm also thinking that I should employ some kind of sender address
> whitelisting using e.g. TxRep. Most of my spam is stuff that I'm
> receiving for the first time from a particular sender, and there are a
> lot of strings that I can say for sure I'd never find in a Subject
> line of a message from a friend who is emailing me for the first time:
> "ATTN", "stock tip"... All of the mail I send is Bcc'ed to myself, is
> there a way to get Spamassassin to notice when this comes in and
> automatically whitelist the recipients for me?

no need to do so and for sure you don't want it automatically, you 
*think* you want it - a blind whitelisting is easy to trick out with 
forged senders, whitelist_auth is based on DKIM/SPF precence

but tyically you don't need much whitelisting except you are a hosting 
provier and care about your load (combining whitelist_auth and shortcircuit)


Mime
View raw message