spamassassin-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From RW <rwmailli...@googlemail.com>
Subject Re: Bayes refinement
Date Sat, 17 May 2014 13:26:09 GMT
On Fri, 16 May 2014 21:36:22 -0600
Bob Proulx wrote:

> David Jones wrote:
> > > James B. Byrne wrote:
> > > If you keep Bayes well trained (assuming you have enough ham to
> > > do so) Bayes poisoning is a myth.
> > 
> > I'm not sure I agree with the "myth" statement.  I just had to
> > reset my Bayes DB after years of it slowly drifting due to bad user
> > input and such.

That's mistraining. So-called "Bayes poisoning"  is aimed at affecting
the classification.  

> 
> Years?  How far back does your Bayes db store data?
..
> My Bayes db only has the last month's of data in it.  That is a
> completely stock configuration.  I think the storage is actually by
> number of tokens not age though.  It would be great if someone could
> explain that in better detail.

It is managed by number. Each token has an access time which records
when it last contributed to a classification or appeared in a learned
email, the least recently seen tokens get purged.  What gets purged is a
mixture of obsolete (often ephemeral) tokens and the long tail of
infrequently seen tokens.

Having a month of retention doesn't mean that you only have a month of
data, because the most important tokens never get purged and so
contain information that can go back years.

IIWY I'd increase the number of tokens, 150,000 is pretty small. Some
tokens are characteristic of senders or the mail servers they use, and
retaining those signature tokens helps to identify ham, and avoid FPs. I
wouldn't like to keep the retention below, or close to, a month because
some ham is sent monthly. 

Mime
View raw message