couchdb-replication mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Dave Cottlehuber <...@jsonified.com>
Subject Re: _bulk_get protocol extension
Date Fri, 24 Jan 2014 21:51:58 GMT
Hey Jens,

that looks interesting indeed. Worth posting a jira ticket with the
link, so it doesn't get lost in email.

A+
Dave

On 24 January 2014 16:20, Jens Alfke <jens@couchbase.com> wrote:
> (I'm excited about this list! There have been some topics I've wanted to bring up that
are too implementation-oriented for the user@ list, but I haven't been brave enough to dive
into the dev@ list because I don't know Erlang or the internals of CouchDB. I also really
appreciate folks sharing the viewpoint that CouchDB is an ecosystem and an open replication
protocol, not just a particular database implementation.)
>
> Anyway. One topic I'd like to bring up is that, in my non-scientific observations, the
major performance bottleneck in pull replications is the fact that revisions have to be transferred
using individual GET requests. I've seen very poor performance when pulling lots of small
documents from a distant server, like an order of magnitude below the throughput of sending
a single huge document.
>
> (Yes, it's possible to get multiple revisions at once by POSTing to _all_docs. Unfortunately
this has limitations that make it unsuitable for replication; see my explanation at the page
linked below.)
>
> A few months ago I experimentally implemented a new "_bulk_get" REST call in Couchbase's
replicators (Couchbase Lite and the Sync Gateway), which significantly improves performance
by allowing the puller to request any number of revisions in a single HTTP request. Again,
no scientific tests or hard numbers, but it was enough to convince me it's worthwhile. I've
documented it here:
>         https://github.com/couchbase/sync_gateway/wiki/Bulk-GET
> It's pretty straightforward and I've tried to make it consistent with the standard API.
The only unusual thing is that the response can contain nested MIME multipart bodies: the
response format is multipart, with every requested revision in a part, but revisions containing
attachments are themselves sent as multipart. (This shouldn't be an issue for any decent multipart
parser, since nested multipart is pretty common in emails, but I think it's the first time
it's happened in the CouchDB API.)
>
> I'd be happy if this were implemented in CouchDB and made an official part of the API.
Hopefully the spec I wrote is detailed enough to make that straightforward. (I don't have
the Erlang skills to do it myself, though.)
>
> —Jens

Mime
View raw message