incubator-couchdb-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Robert Newson <>
Subject Re: Why are reads from CouchDB so slow? (1.5MB/s or thereabouts)
Date Fri, 23 Mar 2012 13:28:00 GMT
I'd be intrigued to know how we could store the raw JSON on disk given
the make_blocks behavior but, yes, the reason couchdb isn't giving
sendfile() like performance is the json<>erlang conversion at least,
and other things like reading btree nodes to find the data (even if
they come from cache), etc.

What I was driving at with 'use a benchmarking tool' was to eliminate
artifacts like the time curl takes to start the connection, etc. All
the tools I listed record the time of the actual request/response.
With ab and nodeload I can get similar figures, though I can also
crank up concurrency and get the same numbers for each request (but
10x the total throughput). Without those kinds of options (number of
users, tcp keep-alive, http keep-alive) it's very hard to discuss and
compare benchmarks. Curl just isn't enough.


On 23 March 2012 13:23, Jonathan Williamson <> wrote:
> Volker,
> Thanks for the input, that all sounds likely! It's not a massive
> problem for us to store a raw copy of our data alongside our Couch
> databases.
> I do however think this falls slightly under "unexpected behaviour" as
> intuitively I think a lot of people would expect raw read speeds to be
> pretty fast and not require such heavy CPU usage. That said it's easy
> to work around just a bit of a surprise to come across.
> Love CouchDB for all the things it does for us so well - it's a great product!
> Jon.
> On Fri, Mar 23, 2012 at 1:17 PM, Volker Mische <> wrote:
>> I agree that using a trusted benchmarking tool is the way to go. Tough
>> what Jonathan sees is pretty clear. It's the JSON -> Eterm -> String
>> conversion that CouchDB is currently doing. Filipe proposed a patch that
>> store the raw JSON on disk, to get rid most of this conversion. I don't
>> remember exactly, but I'm pretty sure he provided sensible benchmarking
>> results back then.
>> Hence the point of this thread shouldn't be: go, do it properly. But: we
>> have a clue why it is so slow, we don't store raw JSON, but Eterms on
>> disk, that need to be assembled to a string everytime you request it.
>> Cheers,
>>  Volker

View raw message