From user-return-9134-apmail-couchdb-user-archive=couchdb.apache.org@couchdb.apache.org Thu Mar 04 18:07:56 2010 Return-Path: Delivered-To: apmail-couchdb-user-archive@www.apache.org Received: (qmail 37723 invoked from network); 4 Mar 2010 18:07:55 -0000 Received: from unknown (HELO mail.apache.org) (140.211.11.3) by 140.211.11.9 with SMTP; 4 Mar 2010 18:07:55 -0000 Received: (qmail 55780 invoked by uid 500); 4 Mar 2010 18:07:43 -0000 Delivered-To: apmail-couchdb-user-archive@couchdb.apache.org Received: (qmail 55685 invoked by uid 500); 4 Mar 2010 18:07:43 -0000 Mailing-List: contact user-help@couchdb.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@couchdb.apache.org Delivered-To: mailing list user@couchdb.apache.org Delivered-To: moderator for user@couchdb.apache.org Received: (qmail 58329 invoked by uid 99); 4 Mar 2010 13:42:25 -0000 X-ASF-Spam-Status: No, hits=-0.0 required=10.0 tests=SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of thostrup@gmail.com designates 74.125.82.52 as permitted sender) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:received:date:message-id:subject :from:to:content-type; bh=uGyG7/dKjEckEURwm6XrX+a44a6JnuyQxvH9vJOcyJA=; b=WrlqYNmFvSVEYBLhKN+rB3MuOY/L1fMu694kmLCL8/V2DvTTpHUV3SBsAv2d6ZwAX8 uwE4QYYjJNHS3a6m0/ErCIHQOolWXGdsDRiKSefPmEmuYP10FeLUAHyJYueKTfslF1LN itBN2+l+WLWD0yePO6WDwoKsPeK0u6Ai75bLw= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:date:message-id:subject:from:to:content-type; b=uobTJ0W/kF1FL6gACf78IVWumv2VgsX2iz0nNvYIFpfW0uEOit2IlmA5QzEGB6gVPh 4cbDKRgxsvaGK/rxkYJqUUNvXB0UKAUlI06R15Tz0TfKrl0y1FejdPdS9wweKjpIF839 fsX2gqIUFD6d+9yRvlBNZ0ijGH7kFfyi/Ihr0= MIME-Version: 1.0 Date: Thu, 4 Mar 2010 14:41:56 +0100 Message-ID: Subject: A short _id size performance report and question regarding 0.11 performance From: Henrik Thostrup Jensen To: user@couchdb.apache.org Content-Type: text/plain; charset=ISO-8859-1 X-Virus-Checked: Checked by ClamAV on apache.org Hi We're running CouchDB in production, and are currently storing around ~800K records in it. Lately view performance has started to become a hindering factor, especially when creating new views or changing existing ones (which is essentially creating a new view). However we are currently using 56 byte _id fields, which I've come to realize was a bad choice. So I've made a few tests with smaller _id fields and decided to post them here. Unfortunately we cannot use the UUIDs assigned by CouchDB as we rely on the _id field to detect duplicate records (which is somewhat inherent in the way we collect distributed information, though it doesn't happen particularly often, it is definitely needed). Our data is also somewhat hetereogenous, and we often generate view keys based on different data items in the records, including the actual data values (so relational is a somewhat poor fit for us). I've done tests with 56, 22, and 12 bytes _id fields. The initial tests where done with CouchDB 0.10.0 on Karmic. I've tried 0.11 as well (but we'll take that later in the mail). 4 byte _id fields are not really possible for us as we would have significant chance of getting different records with the same _id. 8 bytes should be possible though, but wasn't tested. Test 1: Insert 70k records into database (inserted in same order), in chunks of 100 and measure db size: Results: 56 bytes 207.0 MB 22 bytes 175.6 MB 12 bytes 165.8 MB After compaction 56 bytes 146.7 MB 22 bytes 125.8 MB 12 bytes 120.0 MB Test 2: Construct a simple view over the data: 56 bytes 73 MB 22 bytes 54 MB 12 bytes 47 MB After compaction: 56 bytes 19 MB 22 bytes 14 MB 12 bytes 12 MB Test 3: Time for constructing a temporary view: 56 bytes 70 seconds 22 bytes 57 seconds 12 bytes 53 seconds In short, smaller _id fields provide a nice space reduction and saves a bit of time, but doesn't make it significantly faster. I build the current branch of 0.11 on Karmic as collation performance should have improved with that. I only redid the 12 byte _id tests. Test1: After initial insert: 151.3 MB (a bit smaller than 0.10) After compaction: 120.0 (same as 0.10) Test2 : Initial view build size: 153 MB (quite a lot more than 0.10) After compactions: 12 MB (same as 0.10) Test3: Time for constructing temporary view: 121 seconds (more than twice of 0.10). Does anyone have an idea of what could be wrong? Especially the increased view build time worries me, as I was hoping 0.11 could provide a needed performance boost for us. Please CC any replies, as I am not subscribed. -- - Henrik