couchdb-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Antony Blakey <>
Subject Re: Attachment Replication Problem - Bug Found
Date Mon, 18 May 2009 00:45:04 GMT

On 17/05/2009, at 9:27 PM, Adam Kocoloski wrote:

> On May 16, 2009, at 8:30 PM, Antony Blakey wrote:
>> On 17/05/2009, at 12:09 AM, Adam Kocoloski wrote:
>>> So, I think there's still some confusion here.  By "open  
>>> connections" do you mean TCP connections to the source?  That  
>>> number is never higher than 10.  ibrowse does pipeline requests on  
>>> those 10 connections, so there could be as many as 1000  
>>> simultaneous HTTP requests.  However, those requests complete as  
>>> soon as the data reaches the ibrowse client process, so in fact  
>>> the number of outstanding request during replication is usually  
>>> very small.  We're not doing flow control at the TCP socket layer.
>> OK, I understand that now. That means that a document with > 1000  
>> attachments can't be replicated because ibrowse will never send  
>> ibrowse_async_headers for the excess attachments to  
>> attachment_loop, which needs to happen for every attachment before  
>> any of the data is read by doc_flush_binaries. Which is to say that  
>> every document attachment needs to start e.g. receive headers,  
>> before any attachment bodies are consumed.
> Not quite.  So, this discussion is going to quickly become even more  
> confusing because as of yesterday attachments are downloaded on  
> dedicated connections outside the load-balanced connection pool.   
> For the sake of argument let's stick with the behavior as of 2 days  
> ago at first.
> I keep coming back to this key point: _ibrowse has no flow  
> control_.  It doesn't matter whether we consume the  
> ibrowse_async_headers message in the attachment receiver or not;  
> ibrowse is still going to immediately send all those  
> ibrowse_async_response messages our way.

Sure, my point was that once the queue is full it won't send the  
ibrowse_async_headers (because it will never start the connection). I  
didn't realise that it would fail before that (as you explain below).  
I was assuming it would just block. Hence all my previous comments.

> Now, your point about limits on the number of attachments in a  
> document is a good one.  What I imagine would happen is the following:
> 1) couch_rep spawns off 1000+ attachment requests to ibrowse for a  
> single document
> 2) ibrowse starts sending back {error, retry_later} responses when  
> the queue is full
> 3) the attachment receiver processes start exiting with  
> attachment_request_failed
> 4) couch_rep traps the exits and reboots the document enumerator  
> starting at current_seq
> 5) repeat
> Obviously this is not a good situation.  Now, I mentioned earlier  
> that as of yesterday the attachment downloads are each done on  
> dedicated connections.  I pulled them out of the connection pool so  
> that a document download didn't get stuck behind a giant attachment  
> download (the end result would be one way to make couch run out of  
> memory).  This means that the max_connections x max_pipeline doesn't  
> apply to attachments.  Of course, using dedicated connections has  
> its own scalability problems.  Setting up and tearing down all of  
> those connections for the "lots of small attachments" case  
> introduces a significant cost, and eventually we could have so many  
> connections in TIME_WAIT that we run out of ephemeral ports.

That new scalability problem is what I thought the original problem  
was with ibrowse before I learnt it had a pool.

> A better solution might be to have a separate load-balanced  
> connection pool just for attachments.  We'd have to exercise some  
> care not to retry attachment requests on a connection that already  
> has requests in the pipeline.
>> In my case, I have some large attachments and unreliable links, so  
>> I'm partial to a solution that allows progress even of partial  
>> attachments during link failure. We could get this by not delaying  
>> the attachments, and buffering them to disk, using range requests  
>> on the get for partial downloads. This would solve some problems  
>> because it starts with the requirement to always make progress,  
>> never redoing work. This seems like it could be done reasonably  
>> transparently just by modifying the attachment download code.
> I definitely like the idea of Range support for making progress in  
> the event of link failure.  In theory, it would be possible to build  
> this into ibrowse so we could transparently use it for very large  
> documents as well.
> I'm not absolutely opposed to saving attachments to temporary files  
> on disk, but I'd prefer to exhaust in-memory options first.

I'm pretty sure that the only scalable solution that will handle  
documents with significant numbers of attachments is to avoid having  
all the attachments be in-progress downloading before the document is  
written e.g. either buffering to disk or a more radical mod of  
allowing attachments to be written before the document, which I guess  
is not going to happen. And once you allow buffering to disk as a last  
resort, you may as well use it as the default mechanism. Apart from  
anything else, it's a good basis for partial attachment download  

I'm wondering if it's worth exhausting in-memory options if disk  
buffering is absolutely required for at least one use case?

The problem I see with building it into ibrowse is the requirement to  
inject the restart/file management/expiration policies into ibrowse.


Antony Blakey
CTO, Linkuistics Pty Ltd
Ph: 0438 840 787

In anything at all, perfection is finally attained not when there is  
no longer anything to add, but when there is no longer anything to  
take away.
   -- Antoine de Saint-Exupery

View raw message