spamassassin-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Alex Regan <mysqlstud...@gmail.com>
Subject Re: Bayes autolearn questions
Date Tue, 09 Sep 2014 13:50:08 GMT
Hi,

>>> Did you understand that all
>>> tokens are learned, regardless whether they have been seen before?
>>
>> That doesn't really matter from a user perspective, though, right? I
>> mean, if there are tokens that have already been learned are learned
>> again, the net result is zero.
>
> Very much not zero. Each token has several values assocated with it:
>   # ham
>   # spam
>   time-stamp
>
> So each time it's learned its respective ham/spam counter is incremented
> which indicates how spammy or hammy a given token is and its time-stamp is
> updated indicating how "fresh" a token is. The bayes expiry process removes
> "stale" tokens when it does its job to prune the database down to size.

Ah, yes, of course. I knew about that, but somehow didn't put it 
together with this.

I would like to know why, after training similar messages a number of 
times, it still shows the same bayes score on new similar messages.

I'd also like to figure out why or how many more times it's necessary 
for a message to be re-trained to reflect the new desired persuasion.

I've had this particular FN with frequently a bayes50, sometimes lower, 
that also have a few dozen every day that are tagged as spam properly, 
but still have bayes50. I pull them out of the quarantine and keep 
training them as spam, but there's still a few that get through every day.

Is there any particular analysis I can do on one of the FNs that can 
tell me how far off the bayes50 is from becoming bayes99 in a similar 
message?

Hopefully that's clear. I understand there's a large number of variables 
involved here, and I would think the fewer number of tokens in a 
message, the more difficult it probably should be to persuade, but it's 
frustrating to see bayes50 so repeatedly...

Thanks,
Alex

Mime
View raw message