couchdb-replication mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Yaron Goland <yar...@microsoft.com>
Subject RE: _bulk_get protocol extension
Date Fri, 24 Jan 2014 17:35:21 GMT
In the HTTP WG more than a decade ago issues like this came up under the name 'boxcar'ing'.
But with the introduction of pipelining the performance benefits of boxcar'ing for idempotent
requests went away. 

In a replication the source should be able to fire off GET requests down the pipeline non-stop
and the remote server should be able to return them just as quickly. So have you identified
why you are seeing bad performance?

	Thanks,

			Yaron

> -----Original Message-----
> From: Jens Alfke [mailto:jens@couchbase.com]
> Sent: Friday, January 24, 2014 7:21 AM
> To: replication@couchdb.apache.org
> Subject: _bulk_get protocol extension
> 
> (I'm excited about this list! There have been some topics I've wanted to bring
> up that are too implementation-oriented for the user@ list, but I haven't
> been brave enough to dive into the dev@ list because I don't know Erlang or
> the internals of CouchDB. I also really appreciate folks sharing the viewpoint
> that CouchDB is an ecosystem and an open replication protocol, not just a
> particular database implementation.)
> 
> Anyway. One topic I'd like to bring up is that, in my non-scientific
> observations, the major performance bottleneck in pull replications is the
> fact that revisions have to be transferred using individual GET requests. I've
> seen very poor performance when pulling lots of small documents from a
> distant server, like an order of magnitude below the throughput of sending a
> single huge document.
> 
> (Yes, it's possible to get multiple revisions at once by POSTing to _all_docs.
> Unfortunately this has limitations that make it unsuitable for replication; see
> my explanation at the page linked below.)
> 
> A few months ago I experimentally implemented a new "_bulk_get" REST call
> in Couchbase's replicators (Couchbase Lite and the Sync Gateway), which
> significantly improves performance by allowing the puller to request any
> number of revisions in a single HTTP request. Again, no scientific tests or hard
> numbers, but it was enough to convince me it's worthwhile. I've
> documented it here:
> 	https://github.com/couchbase/sync_gateway/wiki/Bulk-GET
> It's pretty straightforward and I've tried to make it consistent with the
> standard API. The only unusual thing is that the response can contain nested
> MIME multipart bodies: the response format is multipart, with every
> requested revision in a part, but revisions containing attachments are
> themselves sent as multipart. (This shouldn't be an issue for any decent
> multipart parser, since nested multipart is pretty common in emails, but I
> think it's the first time it's happened in the CouchDB API.)
> 
> I'd be happy if this were implemented in CouchDB and made an official part
> of the API. Hopefully the spec I wrote is detailed enough to make that
> straightforward. (I don't have the Erlang skills to do it myself, though.)
> 
> -Jens

Mime
View raw message