couchdb-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Adam Kocoloski <>
Subject Re: Attachment Replication Problem - Bug Found
Date Sun, 17 May 2009 13:27:20 GMT
On May 16, 2009, at 8:30 PM, Antony Blakey wrote:

> On 17/05/2009, at 12:09 AM, Adam Kocoloski wrote:
>> So, I think there's still some confusion here.  By "open  
>> connections" do you mean TCP connections to the source?  That  
>> number is never higher than 10.  ibrowse does pipeline requests on  
>> those 10 connections, so there could be as many as 1000  
>> simultaneous HTTP requests.  However, those requests complete as  
>> soon as the data reaches the ibrowse client process, so in fact the  
>> number of outstanding request during replication is usually very  
>> small.  We're not doing flow control at the TCP socket layer.
> OK, I understand that now. That means that a document with > 1000  
> attachments can't be replicated because ibrowse will never send  
> ibrowse_async_headers for the excess attachments to attachment_loop,  
> which needs to happen for every attachment before any of the data is  
> read by doc_flush_binaries. Which is to say that every document  
> attachment needs to start e.g. receive headers, before any  
> attachment bodies are consumed.

Not quite.  So, this discussion is going to quickly become even more  
confusing because as of yesterday attachments are downloaded on  
dedicated connections outside the load-balanced connection pool.  For  
the sake of argument let's stick with the behavior as of 2 days ago at  

I keep coming back to this key point: _ibrowse has no flow control_.   
It doesn't matter whether we consume the ibrowse_async_headers message  
in the attachment receiver or not; ibrowse is still going to  
immediately send all those ibrowse_async_response messages our way.

Now, your point about limits on the number of attachments in a  
document is a good one.  What I imagine would happen is the following:

1) couch_rep spawns off 1000+ attachment requests to ibrowse for a  
single document
2) ibrowse starts sending back {error, retry_later} responses when the  
queue is full
3) the attachment receiver processes start exiting with  
4) couch_rep traps the exits and reboots the document enumerator  
starting at current_seq
5) repeat

Obviously this is not a good situation.  Now, I mentioned earlier that  
as of yesterday the attachment downloads are each done on dedicated  
connections.  I pulled them out of the connection pool so that a  
document download didn't get stuck behind a giant attachment download  
(the end result would be one way to make couch run out of memory).   
This means that the max_connections x max_pipeline doesn't apply to  
attachments.  Of course, using dedicated connections has its own  
scalability problems.  Setting up and tearing down all of those  
connections for the "lots of small attachments" case introduces a  
significant cost, and eventually we could have so many connections in  
TIME_WAIT that we run out of ephemeral ports.

A better solution might be to have a separate load-balanced connection  
pool just for attachments.  We'd have to exercise some care not to  
retry attachment requests on a connection that already has requests in  
the pipeline.

> In my case, I have some large attachments and unreliable links, so  
> I'm partial to a solution that allows progress even of partial  
> attachments during link failure. We could get this by not delaying  
> the attachments, and buffering them to disk, using range requests on  
> the get for partial downloads. This would solve some problems  
> because it starts with the requirement to always make progress,  
> never redoing work. This seems like it could be done reasonably  
> transparently just by modifying the attachment download code.

I definitely like the idea of Range support for making progress in the  
event of link failure.  In theory, it would be possible to build this  
into ibrowse so we could transparently use it for very large documents  
as well.

I'm not absolutely opposed to saving attachments to temporary files on  
disk, but I'd prefer to exhaust in-memory options first.

Cheers, Adam

View raw message