Return-Path: Delivered-To: apmail-couchdb-user-archive@www.apache.org Received: (qmail 41604 invoked from network); 26 May 2009 21:36:26 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 26 May 2009 21:36:26 -0000 Received: (qmail 29425 invoked by uid 500); 26 May 2009 21:36:38 -0000 Delivered-To: apmail-couchdb-user-archive@couchdb.apache.org Received: (qmail 29365 invoked by uid 500); 26 May 2009 21:36:38 -0000 Mailing-List: contact user-help@couchdb.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@couchdb.apache.org Delivered-To: mailing list user@couchdb.apache.org Received: (qmail 29355 invoked by uid 99); 26 May 2009 21:36:38 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 26 May 2009 21:36:38 +0000 X-ASF-Spam-Status: No, hits=-0.0 required=10.0 tests=SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of jchris@gmail.com designates 209.85.220.163 as permitted sender) Received: from [209.85.220.163] (HELO mail-fx0-f163.google.com) (209.85.220.163) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 26 May 2009 21:36:28 +0000 Received: by fxm7 with SMTP id 7so5029448fxm.11 for ; Tue, 26 May 2009 14:36:07 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:sender:received:in-reply-to :references:date:x-google-sender-auth:message-id:subject:from:to :content-type:content-transfer-encoding; bh=35uvm8OY366LFOMIqcNMdHKSds8ql3Z1Bxrkath3Kqc=; b=Sagx85agW9GhjxDH7dvimF85/KQUErCoz8VqBMjRXvJ7kwZzF35T8+SMK2LOmzERx5 vkLJp8Z58vPqGxor9Y+a1pj17RuE15DDkD4l7g9RXC6Rf5kpSCw0VOTalOqutHOvgK3z J6+LWv5TjtTVy+t/Tn23+thOZJadFCSYVoA/8= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:sender:in-reply-to:references:date :x-google-sender-auth:message-id:subject:from:to:content-type :content-transfer-encoding; b=JFd0PI8iMESBMtQQScfw23fWZw4Z4pNXTRLnNuFlDpQAElAfwjPaWPtLZln76Tgd+d ctlCBkk9y9/DPauj0rzpgbEIMU9hRwxwKKud3e1PgZul+ajV+dB8oSbpsMMsCYMfImK6 ta5sBImsniuL7yb3eB2cCsCeJx7dKqQUcWxK0= MIME-Version: 1.0 Sender: jchris@gmail.com Received: by 10.204.59.145 with SMTP id l17mr8514422bkh.28.1243373766829; Tue, 26 May 2009 14:36:06 -0700 (PDT) In-Reply-To: <45ae90370905261431s45669c59t680d256d9c600d5f@mail.gmail.com> References: <45ae90370905261431s45669c59t680d256d9c600d5f@mail.gmail.com> Date: Tue, 26 May 2009 14:36:06 -0700 X-Google-Sender-Auth: 00b1181d2a3e050d Message-ID: Subject: Re: specifying an _id results in a much smaller DB? From: Chris Anderson To: user@couchdb.apache.org Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit X-Virus-Checked: Checked by ClamAV on apache.org On Tue, May 26, 2009 at 2:31 PM, Jeff Macdonald wrote: > Hi all, > I've been experimenting with CouchDB. I'm use Net::CouchDB to batch insert > 20 docs at a time and I'm simply setting _id to a sequence that is > incremented for each doc. For just over 9 million rows where each row is > just 6 small fields the resulting DB is 3.4G. When I was letting CouchDB set > the _id, the resulting database was over 20G. The input source as a tab > delimited file is just over 500MB. > > So is it normal for CouchDB to create such a large database file when it > assigns document ids? > yes, currently couchdb docids are random which means more of the btree must be rewritten, than if they were concentrated, such as you see with sequential ids. for high performance applications, sequential ids is faster as well. Compacting may shrink your databases so they are roughly equal size. You an trigger compaction from Futon. I'd be interested to see what results you get. > -- > Jeff Macdonald > Ayer, MA > -- Chris Anderson http://jchrisa.net http://couch.io