incubator-couchdb-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jonathan Williamson <...@netcopy.co.uk>
Subject Re: Why are reads from CouchDB so slow? (1.5MB/s or thereabouts)
Date Fri, 23 Mar 2012 12:15:05 GMT
Jason,

Apologies if I came off too demanding - I don't mean to be! I just
want to understand why so that we can make a sensible decision as to
how to move forward with or without CouchDB in the long term.

My initial comparison was actually to reading a file off disk, but I
thought that unfair so added the overhead of a web server. That's not
to say I'd expect CouchDB to 1:1 match the performance of Nginx, but
currently it is 1 to 2 orders of magnitude slower for the task I
described.

It's interesting to hear that CouchDB compares the document to a
checksum prior to serving it, do you have any idea what overhead this
adds? What's the reasoning behind it? (I mean data could be corrupted
in transmission, or in memory after checksumming, etc).

The main reasons I would expect CouchDB to be fast at this specific
operation are:

- Prebuilt indexes: I was surprised this didn't allow to CouchDB to
very quickly identify where to retrieve the document from within its
datafiles.
- Internal storage format: Seems to be almost raw JSON in the files
with a bit of metadata, should allow for (almost) direct streaming to
client?

It's not that CouchDB is slower than Nginx per se, it's that that's
massively slower. For example having flushed file caches on my dev box
Nginx can serve a large static file at 83MB/s (which is 25 times
faster than CouchDB on the same hardware).

On Fri, Mar 23, 2012 at 11:56 AM, Jason Smith <jhs@iriscouch.com> wrote:
> CouchDB verifies that the document contents match a checksum which
> does impose computation and codec overhead, yes.
>
> Considering that CouchDB stores multiple sorted indices to the
> documents in a database which is itself a filesystem file, in a safe
> append-only format, how would you justify an expectation of static
> Nginx performance? Surely CouchDB must open the file (right there you
> have tied Nginx at best) and then seek through its metadata to fetch
> the doc. Note, my disagreement with you is not fundamental, just of
> degree. Surely it is fair to give CouchDB some elbow room to work, to
> pay for its benefits?
>
> Back to document comprehension, CouchDB does do that and this is a
> huge opportunity for improvement. I believe Filipe has indeed proposed
> something much like you describe: store the utf-8 JSON directly on the
> disk.
>
> I'm excited that this conversation can paint a more clear picture of
> what we expect from CouchDB, to find a speed at which we could say,
> "this is slower than Brand X, but it's worth it."
>
> On Fri, Mar 23, 2012 at 11:41 AM, Jonathan Williamson <jon@netcopy.co.uk> wrote:
>> As I'm requesting the documents in the exact format I submitted them
>> (with no transformations or extra information) I'd expect something
>> not far off a static file request from Nginx. As far as I can tell the
>> .couch files aren't compressed (though that wouldn't cause such slow
>> performance on an i5 anyway) and appear to contain the original
>> documents almost "as is".
>>
>> The other side effect is that while fetching the documents the CPU
>> usages rises to 100% which suggests, I guess, that CouchDB is reading,
>> deserialising, serialising, and then streaming the document. But it
>> doesn't seem like that should be necessary really?

Mime
View raw message