incubator-couchdb-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael McDaniel <>
Subject Re: Write Performance
Date Thu, 08 Jan 2009 01:23:25 GMT

 I found that bulk loading is significantly faster, if you can format
 your documents into a file.  Sometimes that is not so handy to do.
 Of course, substitute your IP, and your db name for 'test'

$ curl -X POST --data @file_of_docs 

where file_of_docs looks, literally, like

{ "docs" : 
    {"name":"test_one" , "date":"Sun Jan 4, 2008" , "place":"Portland" } ,

       more documents ...

    {"name":"test_n" , "date":"Tue Jan 6, 2008" , "place":"Portland" }

I say "literally" as in, the quotes you see are the quotes you need, no
escaping and no extra quotes before/after the leading/trailing { brackets }

Nothing needed escaping in the above.  Possibly some characters would need
escaping, but not whitespace.

Using Erlang R12B-5 and couchdb - Apache CouchDB 0.9.0a730600-incubating
I loaded 10,000 of the above short docs in about 5 seconds.  Both Erlang
and couchdb compiled from scratch on the following machine.

Linux version 2.6.24-21-server (buildd@palmer) (gcc version 4.2.3 (Ubuntu 4.2.3-2ubuntu7))
#1 SMP Wed Oct 22 00:18:13 UTC 2008 (Ubuntu 2.6.24-21.43-server)

dmesg says server has:
Memory: 1538064k/1563840k available
CPU0: Intel(R) Pentium(R) 4 CPU 3.00GHz stepping 01
CPU1: Intel(R) Pentium(R) 4 CPU 3.00GHz stepping 01
Total of 2 processors activated (11973.88 BogoMIPS)


On Wed, Jan 07, 2009 at 06:37:32PM -0600, Josh Bryan wrote:
> Hi,
> I am looking into CouchDB as a solution to store a bunch (approx 70
> million) archived documents.  While planning for the import process, I
> did some benchmarking to figure out how long the import will take.  I
> get about 50-70 inserts per second on average.  However, when I looked
> for the bottleneck, I couldn't figure it out.  I am connected to the
> database via a fast lan and can verify that the network is not
> saturated.  I can also verify that disk IO is not saturated.  The only
> clue is that of the 4 cpus on the server, it seems that only one is
> getting fully loaded.  Also, of the 5 erlang processes I can see
> running, only one of them seems to be getting most of the cpu time.  I
> know that erlang is built with smp enabled, so if it is cpu bound, why
> can't it make use of the other 3 processors?
> I thought that perhaps there was some internal write lock issue per
> database that allowed only one thread to write to a db at a time, so I
> tried running the benchmarks while hitting multiple databases, but still
> got the same write rate across the databases.  Is there some globally
> shared resource in couchdb that limits all writes to a single thread? 
> Thanks,
> Josh

Michael McDaniel
Portland, Oregon, USA

View raw message