spamassassin-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Dave Koontz <dkoo...@mbc.edu>
Subject Re: bayes_seen = 256GB
Date Wed, 19 Sep 2007 23:05:38 GMT
Thanks Michael.  I don't see anything in bugzilla, so I am adding that
this to the list.  (see Bug 5652)

BTW, the link on the submission page for "bug writing guidelines"
generates a 404 error. So I will take my best guess here.

My request is below.  I'd love to take this on myself, but I am far from
a perl expert.  Any Perl / SA gurus out there who can look at this? 
Complaints from average users keep coming in to this list, generally
after they run out of resources do they notice this flaw.

Bugzilla #5652 - bayes_seen - auto expire
http://issues.apache.org/SpamAssassin/show_bug.cgi?id=5652

---

bayes_seen db grows without any purge cycle, even if previously learned tokens 
have long been expired for the main bayes db.  Users non-sa saavy often 
complain of over sized seen db file sizes, at times from 250mb-4GB in size.

Request for a new process and variable to control the seen db size... perhaps:

Bayes_Unlearn_Threshold_days

Where a user could enter a value for how many days to keep the seen DB tokens 
and expire those older than that threshold.  Perhaps a DEFAULT value of 7 days 
would be in order as most spam campains last a single day at most.  A 30 day 
purge should be more than safe for most anyone and bets a non-expiry system.



Michael Parker wrote:
> Dave Koontz wrote:
>   
>> Theo and all.  I know this topic comes up on occasion, but I am not sure
>> I've ever seen an explanation as to why the bayes_seen file is not auto
>> pruned along with the bayes db file.  Since tokens expire in the main DB
>> file, what is the purpose of having a seen file to unlearn tokens which
>> may have long ago been purged?   IMO, it would seem logical to also
>> purge the seen file at some sort of cycle so it can't grow so
>> excessively large.
>>
>>     
>
> In order to expire from bayes_seen you have to know that there are no
> longer any tokens from a given msg in the bayes_token database.  This is
> a hard problem, mapping tokens to msgs, so it wasn't done.  Likewise no
> one ever did anything about expiring the bayes_seen entries.
>
> Sounds like a good project, there might even be a bugzilla enhancement
> opened already.
>
> Patches are welcome.
>
> Michael
>
>
>
>   
>> Theo Van Dinter wrote:
>>     
>>> On Wed, Sep 19, 2007 at 03:23:50PM -0600, Mr. Gus wrote:
>>>   
>>>       
>>>>> The file bayes_seen has grown in size to 256GB!  (274992939008)
>>>>> How do I cap the size limit of this file? I want to have it not grow
larger
>>>>> then say 800mb at the most!
>>>>>       
>>>>>           
>>>> You need to expire old bayes tokens. The limit is set not as a size, but
as
>>>>     
>>>>         
>>> Expiring bayes tokens does nothing to the bayes_seen file.  There is no expiry
>>> for bayes_seen.
>>>
>>> If the seen file is bigger than you'd like, I'd just rm the file.
>>>
>>>   
>>>       
>
>   


Mime
View raw message