spamassassin-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Matus UHLAR - fantomas <uh...@fantomas.sk>
Subject Re: Bayes overtraining
Date Wed, 08 Aug 2018 13:04:58 GMT
>> >On Wed, 25 Jul 2018 19:49:04 +0200
>> >Daniele Duca wrote:
>> >> In my current SA setup I use bayes_auto_learn along with some
>> >> custom poison pills (autolearn_force on some rules) , and I'm
>> >> currently wondering if over training SA's bayes could lead to the
>> >> same "prejudice" problem as CRM114.
>> >>
>> >> I'm thinking that maybe it would be better to use
>> >> "bayes_auto_learn_on_error 1"
>>
>> On 26.07.18 15:48, RW wrote:
>> >On a busy server using auto-learning it's probably a good idea to set
>> >this just to increase the token retention, and reduce writes into the
>> >database.

>On Thu, 26 Jul 2018 17:36:19 +0200 Matus UHLAR - fantomas wrote:
>> well, I have a bit different experience.

On 26.07.18 21:25, RW wrote:
>I didn't say auto-training itself, is a good idea.

I mean, if I set bayes_auto_learn_on_error 1, the scores that confirm BAYES
decision would never be trained, even if the decision was correct.

That could result in BAYES scores geting to the wrong direction.

I believe, that after I train BAYES enough, autolearn should be able to do
the rest of work and collect further tokens especially when BAYES_00 or
BAYES_99 is in effect.

re-training a few mismatched mails once a time should be better than pushing
back to the _00 and _99 because only mails pointing to opposite direction
are trained.


>> There are spams hitting negative scoring rules e.g.  MAILING_LIST_MULTI,
>> RCVD_IN_RP_*, RCVD_IN_IADB_* and they are constantly trained as ham.

>You should be able to work around that by adding noautolearn to the
>tflags.

Well, since I tend to trust those rules less and less....

Especially because in the meantime I personally get many spams via mailing
lists I have never subscribed and never seen subscription confirmation.

...of last 40 mail in my spambox, 14 matches MAILING_LIST_MULTI
...of last 100 mail in spambox, 27 matches MAILING_LIST_MULTI

>> I would like to prevent re-training when bayes disagrees with score
>> soming from other rules.

>I don't know what you mean by 'prevent re-training', but auto-learning
>is not supposed to happen if Bayes generates  1 point or more  in the
>opposite direction.

either this is new to me, or I have already forgot, but I have different
feeling about this. Will try to remember and watch.

(I often watch what kind of mail was tagged autolearn=ham)

>> I quite wonder why "learn" tflag causes score being ignored.
>> Only the "noautolearn" flag should be used for this so at least
>> BAYES_99 and BAYES_00 could be takein into account when learning.

>It's to prevent  mistraining from running away in a vicious circle.

I mean, since there's tflag "noautolearn" designed for this, the flag
"learn" should not be ignored.

It's easy to put:

tflags BAYES_99 learn noautolearn

but not possible to put:

tflags BAYES_99 learn dothefuckingautolearn



-- 
Matus UHLAR - fantomas, uhlar@fantomas.sk ; http://www.fantomas.sk/
Warning: I wish NOT to receive e-mail advertising to this address.
Varovanie: na tuto adresu chcem NEDOSTAVAT akukolvek reklamnu postu.
The early bird may get the worm, but the second mouse gets the cheese. 

Mime
View raw message