spamassassin-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Matus UHLAR - fantomas <uh...@fantomas.sk>
Subject Re: Train SA with e-mails 100% proven spams and next time it should be marked as spam
Date Wed, 14 Feb 2018 15:20:30 GMT
>On Tue, 13 Feb 2018 21:02:46 +0000
>Horváth Szabolcs wrote:
>> One more question: is there a recommended ham to spam ratio? 1:1?

On 14.02.18 15:09, RW wrote:
>No, this is a myth.  Bayes computes token probabilities from a token's
>frequencies in spam and ham, so it all scales through. If you have
>2000 ham and 200 spam the problem is too few spams, not a bad ratio.

my experience says you will need more ham than spam, because you want to get
rid of false positives (ham marked as spam) much more than of false negatives.

what really matters is how many of FP/FNs you have, you can decrease
probability by training anything too far from BAYES_00 for ham and BAYES_99
for ham
-- 
Matus UHLAR - fantomas, uhlar@fantomas.sk ; http://www.fantomas.sk/
Warning: I wish NOT to receive e-mail advertising to this address.
Varovanie: na tuto adresu chcem NEDOSTAVAT akukolvek reklamnu postu.
LSD will make your ECS screen display 16.7 million colors

Mime
View raw message