couchdb-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Adam Kocoloski <>
Subject Re: specifying an _id results in a much smaller DB?
Date Tue, 26 May 2009 23:42:48 GMT
On May 26, 2009, at 6:10 PM, Jeff Macdonald wrote:

> On Tue, May 26, 2009 at 5:36 PM, Chris Anderson <>  
> wrote:
>> On Tue, May 26, 2009 at 2:31 PM, Jeff Macdonald < 
>> >
>> wrote:
>>> Hi all,
>>> I've been experimenting with CouchDB. I'm use Net::CouchDB to batch
>> insert
>>> 20 docs at a time and I'm simply setting _id to a sequence that is
>>> incremented for each doc. For just over 9 million rows where each  
>>> row is
>>> just 6 small fields the resulting DB is 3.4G. When I was letting  
>>> CouchDB
>> set
>>> the _id, the resulting database was over 20G. The input source as  
>>> a tab
>>> delimited file is just over 500MB.
>>> So is it normal for CouchDB to create such a large database file  
>>> when it
>>> assigns document ids?
>> yes, currently couchdb docids are random which means more of the  
>> btree
>> must be rewritten, than if they were concentrated, such as you see
>> with sequential ids. for high performance applications, sequential  
>> ids
>> is faster as well.
>> Compacting may shrink your databases so they are roughly equal size.
>> You an trigger compaction from Futon. I'd be interested to see what
>> results you get.
> Well, it took over a day to do it before. I was however only  
> inserting 10
> docs at a time then. So, right now I'm not motivated to find out how  
> well
> the compaction would be. :)

I'd be _very_ surprised if the two compacted DBs differed  
substantially in size.  They should both weigh in smaller than 3.4G,  
since the compactor writes documents in larger blocks than you appear  
to be doing.

I don't know anything about your server setup, but an order-of- 
magnitude estimate for compacting a DB that size these days would be 1  
hour, not 1 day. Best,


View raw message