couchdb-replication mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jens Alfke <j...@couchbase.com>
Subject _bulk_get protocol extension
Date Fri, 24 Jan 2014 15:20:55 GMT
(I'm excited about this list! There have been some topics I've wanted to bring up that are
too implementation-oriented for the user@ list, but I haven't been brave enough to dive into
the dev@ list because I don't know Erlang or the internals of CouchDB. I also really appreciate
folks sharing the viewpoint that CouchDB is an ecosystem and an open replication protocol,
not just a particular database implementation.)

Anyway. One topic I'd like to bring up is that, in my non-scientific observations, the major
performance bottleneck in pull replications is the fact that revisions have to be transferred
using individual GET requests. I've seen very poor performance when pulling lots of small
documents from a distant server, like an order of magnitude below the throughput of sending
a single huge document.

(Yes, it's possible to get multiple revisions at once by POSTing to _all_docs. Unfortunately
this has limitations that make it unsuitable for replication; see my explanation at the page
linked below.)

A few months ago I experimentally implemented a new "_bulk_get" REST call in Couchbase's replicators
(Couchbase Lite and the Sync Gateway), which significantly improves performance by allowing
the puller to request any number of revisions in a single HTTP request. Again, no scientific
tests or hard numbers, but it was enough to convince me it's worthwhile. I've documented it
here:
	https://github.com/couchbase/sync_gateway/wiki/Bulk-GET
It's pretty straightforward and I've tried to make it consistent with the standard API. The
only unusual thing is that the response can contain nested MIME multipart bodies: the response
format is multipart, with every requested revision in a part, but revisions containing attachments
are themselves sent as multipart. (This shouldn't be an issue for any decent multipart parser,
since nested multipart is pretty common in emails, but I think it's the first time it's happened
in the CouchDB API.)

I'd be happy if this were implemented in CouchDB and made an official part of the API. Hopefully
the spec I wrote is detailed enough to make that straightforward. (I don't have the Erlang
skills to do it myself, though.)

—Jens
Mime
View raw message