couchdb-replication mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jens Alfke <j...@couchbase.com>
Subject Re: _bulk_get protocol extension
Date Tue, 28 Jan 2014 05:12:57 GMT

On Jan 27, 2014, at 7:26 PM, Yaron Goland <yarong@microsoft.com> wrote:

> Nevertheless he did say that so long as one probes the connection then pipelining is
known to work. Probing just means that you can't assume that the server you are talking to
is a 1.1 server and therefore supports pipelining.

Well, yes, that's pretty clear — I mean, I know pipelining's been implemented. (And on iOS
and Mac the frameworks already know how to support pipelining, so one doesn't have to do the
probing oneself.)

The problems with pipelining are higher level than that. Did you read the text by Ilya Grigorik
that I linked to? Here's another excerpt:

	• A single slow response blocks all requests behind it.
	• When processing in parallel, servers must buffer pipelined responses, which may exhaust
server resources—e.g., what if one of the responses is very large? This exposes an attack
vector against the server!
	• A failed response may terminate the TCP connection, forcing the client to re-request
all the subsequent resources, which may cause duplicate processing.
	• Detecting pipelining compatibility reliably, where intermediaries may be present, is
a nontrivial problem.
	• Some intermediaries do not support pipelining and may abort the connection, while others
may serialize all requests.
— http://chimera.labs.oreilly.com/books/1230000000545/ch11.html#HTTP_PIPELINING

(Now, HTTP 2.0 is adding multiplexing, which alleviates most of those problems. I'll be happy
when we get to use it, but that probably won't be for a year or two at least.)

I also mentioned the overhead of issuing a bunch of HTTP requests versus just one. As a thought
experiment, consider fetching a one-megabyte HTTP resource by using a thousand byte-range
GET requests each requesting 1K of the file. Would this take longer than issuing a single
GET request for the entire resource? Yeah, and probably a lot longer, even with pipelining.
The client and the server both introduce overhead in handling requests.

Finally, consider that putting a number of related resources together into a single body enables
better compression, since general-purpose compression algorithms look for repeated patterns.
If I have a thousand small documents each of which contains a property named "this_is_my_custom_property",
then if all those documents are returned in one response each instance of that string will
get compressed down to a very short token. If they're separate responses, the string won't
get compressed.

—Jens
Mime
View raw message