spamassassin-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Bill Cole" <sausers-20150...@billmail.scconsult.com>
Subject Re: Best practice for learning submissions
Date Tue, 24 Jul 2018 04:49:50 GMT
[N.B.: Your prior correspondent is not able to post to this list, so we 
only saw your side of that exchange.]

On 23 Jul 2018, at 19:38 (-0400), Nick Bright wrote:

> When requesting submissions from users for use with sa-learn, if they 
> are going to forward the message somewhere; is it best for that to be 
> forwarded as an attachment, or forwarded inline? Will sa-learn 
> automatically understand "the spam is attached" if it's an attachment?
>
> Learning from a mailbox of my own spam (with full headers - the actual 
> mails) is quite different from users *forwarding* spam for training.
>
> So I ask: what is the best practice for learning submissions when 
> using site-wide bayes?

The goal is to get a copy of the message that is identical to what SA 
saw when it arrived. For IMAP users, this is easiest to get with a 
'missed spam' mailbox into which users can move messages for learning. 
If you must rely on forwarded submissions, make sure users are 
forwarding messages as attachments, and have the target deliver into a 
mailbox that is processed to extract the 'message/rfc822' MIME object(s) 
in those submissions and learn those, not the submission mail itself.

Learning ham is harder, because generally speaking it is not a good idea 
to deliver mail that SA believes is spam *at all* unless you can't 
reject it in SMTP. As a result, users don't have 'false positive' 
samples to submit (although their irate would-be correspondents 
could...) In an IMAP environment, you can identify borderline ham that 
is useful to learn by looking at tagging and archiving. If the user 
assigns a keyword to a message and/or moves it to a mailbox (other than 
ones with names like Junk and Spam and Trash) you can usually be sure it 
is ham. If your users are trainable (it DOES happen...) you might even 
get them to use specific keywords and/or archival mailboxes and use 
those to feed ham training. In a POP3 environment, this is a much harder 
problem to solve.

-- 
Bill Cole
bill@scconsult.com or billcole@apache.org
(AKA @grumpybozo and many *@billmail.scconsult.com addresses)
Currently Seeking Steadier Work: https://linkedin.com/in/billcole

Mime
View raw message