Return-Path: X-Original-To: apmail-spamassassin-users-archive@www.apache.org Delivered-To: apmail-spamassassin-users-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 4D20D486C for ; Tue, 21 Jun 2011 14:24:04 +0000 (UTC) Received: (qmail 219 invoked by uid 500); 21 Jun 2011 14:24:01 -0000 Delivered-To: apmail-spamassassin-users-archive@spamassassin.apache.org Received: (qmail 191 invoked by uid 500); 21 Jun 2011 14:24:01 -0000 Mailing-List: contact users-help@spamassassin.apache.org; run by ezmlm Precedence: bulk list-help: list-unsubscribe: List-Post: List-Id: Delivered-To: mailing list users@spamassassin.apache.org Received: (qmail 184 invoked by uid 99); 21 Jun 2011 14:24:01 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 21 Jun 2011 14:24:01 +0000 X-ASF-Spam-Status: No, hits=-2.3 required=10.0 tests=RCVD_IN_DNSWL_MED,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of dfs@roaringpenguin.com designates 70.38.112.54 as permitted sender) Received: from [70.38.112.54] (HELO colo3.roaringpenguin.com) (70.38.112.54) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 21 Jun 2011 14:23:55 +0000 Received: from vanadium.roaringpenguin.com (vanadium.roaringpenguin.com [192.168.10.23]) by colo3.roaringpenguin.com (8.14.3/8.14.3/Debian-5+lenny1) with ESMTP id p5LENYww008517 for ; Tue, 21 Jun 2011 10:23:34 -0400 Received: from hydrogen.roaringpenguin.com (hydrogen.roaringpenguin.com [192.168.10.1]) by vanadium.roaringpenguin.com (8.14.3/8.14.3/Debian-9.4) with ESMTP id p5LENWBo026822 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES128-SHA bits=128 verify=NOT) for ; Tue, 21 Jun 2011 10:23:33 -0400 Date: Tue, 21 Jun 2011 10:23:31 -0400 From: "David F. Skoll" To: users@spamassassin.apache.org Subject: Re: High Performance Bayes Database Configuration? Message-ID: <20110621102331.2e59f1a1@hydrogen.roaringpenguin.com> In-Reply-To: <4E00A553.5000708@junkemailfilter.com> References: <4E00A553.5000708@junkemailfilter.com> Organization: Roaring Penguin Software Inc. X-Mailer: Claws Mail 3.7.6 (GTK+ 2.20.1; i686-pc-linux-gnu) Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=roaringpenguin.com; h=date :from:to:subject:message-id:in-reply-to:references:mime-version :content-type:content-transfer-encoding; s=beta; bh=FNxkdVssp3D5 fJWp/0OoVO7RXCM=; b=ZksuNgZEkAUrlBO5p9BlFwcDh5LVn2RWhl1vv85BXFu1 Kn7NaS0FPhGt0k/CRidBRMNkWzBQK6aACwbb3iK6SNXYb2nPQvjYTOwzed66vIMq MbDp5fN+5ex32ULfbRdOfq1kQSYsUIwgiY9jGNaXrbxo4/Ma5IJ5TSbtgnaYezY= X-Scanned-By: CanIt (www . roaringpenguin . com) on 192.168.7.18 X-Scanned-By: MIMEDefang 2.72 on 192.168.10.23 X-Spam-Score: undef - spam scanning disabled X-CanIt-Geo: No geolocation information available for 192.168.10.23 X-CanItPRO-Stream: outgoing (inherits from default) X-CanIt-Archive-Cluster: SQVyZJxqklY5buiWXYCN4T/BjiM X-CanIt-Archived-As: base/20110621 / 01EWCnyTP X-Virus-Checked: Checked by ClamAV on apache.org On Tue, 21 Jun 2011 07:06:11 -0700 Marc Perkel wrote: > Trying to get MySQL bays working in a high volume environment. > Dedicated MySQL server with SSD drives. Can someone send me a sample > my.cnf file and make other suggestings to keep it running wihout > database corruption and other MySQL "features"? Or - should I be > using some other DB? We've tried various ways of storing Bayes data (we have our own Bayes implementation, so this discussion may not correspond exactly with the SA implementation.) After trying Berkeley DB files and PostgreSQL---we would never use MySQL for any data we care about---we finally settled on Dan Bernstein's CDB format. It has by far the best performance. See: http://www.dmo.ca/blog/benchmarking-hash-databases-on-large-data/ Take a look at the "Random Reads" timings. CDB is 6 times faster than Berkeley DB! CDB is read-only, which means when you want to do Bayes training, you have to rewrite the entire database. This is not an issue for our system because of how we do Bayes training, but it may be an issue with the standard sa-learn. Regards, David.