couchdb-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Robert Newson <rnew...@apache.org>
Subject Re: Why are reads from CouchDB so slow? (1.5MB/s or thereabouts)
Date Fri, 23 Mar 2012 12:26:33 GMT
Are you really extrapolating mb/s from a single curl command?

On 23 March 2012 12:15, Jonathan Williamson <jon@netcopy.co.uk> wrote:
> Jason,
>
> Apologies if I came off too demanding - I don't mean to be! I just
> want to understand why so that we can make a sensible decision as to
> how to move forward with or without CouchDB in the long term.
>
> My initial comparison was actually to reading a file off disk, but I
> thought that unfair so added the overhead of a web server. That's not
> to say I'd expect CouchDB to 1:1 match the performance of Nginx, but
> currently it is 1 to 2 orders of magnitude slower for the task I
> described.
>
> It's interesting to hear that CouchDB compares the document to a
> checksum prior to serving it, do you have any idea what overhead this
> adds? What's the reasoning behind it? (I mean data could be corrupted
> in transmission, or in memory after checksumming, etc).
>
> The main reasons I would expect CouchDB to be fast at this specific
> operation are:
>
> - Prebuilt indexes: I was surprised this didn't allow to CouchDB to
> very quickly identify where to retrieve the document from within its
> datafiles.
> - Internal storage format: Seems to be almost raw JSON in the files
> with a bit of metadata, should allow for (almost) direct streaming to
> client?
>
> It's not that CouchDB is slower than Nginx per se, it's that that's
> massively slower. For example having flushed file caches on my dev box
> Nginx can serve a large static file at 83MB/s (which is 25 times
> faster than CouchDB on the same hardware).
>
> On Fri, Mar 23, 2012 at 11:56 AM, Jason Smith <jhs@iriscouch.com> wrote:
>> CouchDB verifies that the document contents match a checksum which
>> does impose computation and codec overhead, yes.
>>
>> Considering that CouchDB stores multiple sorted indices to the
>> documents in a database which is itself a filesystem file, in a safe
>> append-only format, how would you justify an expectation of static
>> Nginx performance? Surely CouchDB must open the file (right there you
>> have tied Nginx at best) and then seek through its metadata to fetch
>> the doc. Note, my disagreement with you is not fundamental, just of
>> degree. Surely it is fair to give CouchDB some elbow room to work, to
>> pay for its benefits?
>>
>> Back to document comprehension, CouchDB does do that and this is a
>> huge opportunity for improvement. I believe Filipe has indeed proposed
>> something much like you describe: store the utf-8 JSON directly on the
>> disk.
>>
>> I'm excited that this conversation can paint a more clear picture of
>> what we expect from CouchDB, to find a speed at which we could say,
>> "this is slower than Brand X, but it's worth it."
>>
>> On Fri, Mar 23, 2012 at 11:41 AM, Jonathan Williamson <jon@netcopy.co.uk> wrote:
>>> As I'm requesting the documents in the exact format I submitted them
>>> (with no transformations or extra information) I'd expect something
>>> not far off a static file request from Nginx. As far as I can tell the
>>> .couch files aren't compressed (though that wouldn't cause such slow
>>> performance on an i5 anyway) and appear to contain the original
>>> documents almost "as is".
>>>
>>> The other side effect is that while fetching the documents the CPU
>>> usages rises to 100% which suggests, I guess, that CouchDB is reading,
>>> deserialising, serialising, and then streaming the document. But it
>>> doesn't seem like that should be necessary really?

Mime
View raw message