spamassassin-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "David F. Skoll" <...@roaringpenguin.com>
Subject Re: High Performance Bayes Database Configuration?
Date Tue, 21 Jun 2011 14:23:31 GMT
On Tue, 21 Jun 2011 07:06:11 -0700
Marc Perkel <support@junkemailfilter.com> wrote:

> Trying to get MySQL bays working in a high volume environment.
> Dedicated MySQL server with SSD drives. Can someone send me a sample
> my.cnf file and make other suggestings to keep it running wihout
> database corruption and other MySQL "features"? Or - should I be
> using some other DB?

We've tried various ways of storing Bayes data (we have our own Bayes
implementation, so this discussion may not correspond exactly with the
SA implementation.)  After trying Berkeley DB files and PostgreSQL---we
would never use MySQL for any data we care about---we finally settled
on Dan Bernstein's CDB format.  It has by far the best performance.
See: http://www.dmo.ca/blog/benchmarking-hash-databases-on-large-data/
Take a look at the "Random Reads" timings.  CDB is 6 times faster than
Berkeley DB!

CDB is read-only, which means when you want to do Bayes training, you
have to rewrite the entire database.  This is not an issue for our
system because of how we do Bayes training, but it may be an issue
with the standard sa-learn.

Regards,

David.

Mime
View raw message