incubator-couchdb-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Simon Metson <>
Subject Re: couchdb and millions of records
Date Mon, 26 Jul 2010 17:41:17 GMT
	We've done things at this scale with CouchDB. The key thing is to do  
bulk inserts, and to trigger view indexing as you go. For instance our  
code by default will bulk insert 5000 records, then hit a view, then  
do the next 5000 then hit the view etc. Of course the batch size is  
something you'd want to tune, since it'll depend on your documents and  
views. It's much quicker to do the view index incrementally than hit  
all N million records at once. You might also want to hit view and db  
compaction occasionally, especially if you're also doing bulk deletes.

On 26 Jul 2010, at 18:00, Norman Barker wrote:

> Hi,
> I have sampled the wikipedia tsv collection from freebase
> (, I ran this
> through awk and drop the xml field and then did a simple conversion to
> JSON. I then call _bulk_docs 150 docs at a time into couch 0.11.
> I wrote a simple view in erlang that emits the date as a key (I am
> actually using this to test the free text search couchdb-clucene), the
> views are fast once computed.
> The amount of disk storage used by couchdb is an issue, and the write
> times are slow, I changed my view and my 2.3 million view computation
> is still running!
>        "request_time": {
>            "description": "length of a request inside CouchDB without
> MochiWeb",
>            "current": 2253451.122,
>            "sum": 2253451.122,
>            "mean": 501.212,
>            "stddev": 12275.385,
>            "min": 0.5,
>            "max": 798124.0
>        },
> For my use case once the system is up there is only a few updates per
> hour, but doing the initial harvest takes a long time.
> Does 1.0 make substantial gains on this, if so how, are there any
> other areas that I should be looking at to improve this, I am happy
> writing erlang code.
> thanks,
> Norman

View raw message