couchdb-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Antony Blakey <>
Subject Re: Attachment Replication Problem - Bug Found
Date Sun, 17 May 2009 00:30:49 GMT

On 17/05/2009, at 12:09 AM, Adam Kocoloski wrote:

> So, I think there's still some confusion here.  By "open  
> connections" do you mean TCP connections to the source?  That number  
> is never higher than 10.  ibrowse does pipeline requests on those 10  
> connections, so there could be as many as 1000 simultaneous HTTP  
> requests.  However, those requests complete as soon as the data  
> reaches the ibrowse client process, so in fact the number of  
> outstanding request during replication is usually very small.  We're  
> not doing flow control at the TCP socket layer.

OK, I understand that now. That means that a document with > 1000  
attachments can't be replicated because ibrowse will never send  
ibrowse_async_headers for the excess attachments to attachment_loop,  
which needs to happen for every attachment before any of the data is  
read by doc_flush_binaries. Which is to say that every document  
attachment needs to start e.g. receive headers, before any attachment  
bodies are consumed.

With concurrent replications the maximum number of attachments is  
reduced, and it's possible to get a deadlock where the ibrowse queue  
is full but no document has all of it's attachment downloads started.

> I'm not sure I understand what part is "not scalable".  I agree that  
> ignoring the attachment receivers and their mailboxes when deciding  
> whether to checkpoint is a big problem.  I'm testing a fix for that  
> right now.  Is there something else you meant by that statement?   
> Best,

I didn't know about the ibrowse pool, so that part is scalable i.e.  
bounded number of connections and requests. If my comments above are  
correct, then the current architecture isn't scalable in respect to  
the number of attachments in the single-replicator case, and a more  
complicated equation in the multiple-replicator case.

> P.S. One issue in my mind is that we only do the checkpoint test  
> after we receive a document.  We could end up in a situation where a  
> document request is sitting in a pipeline behind a huge attachment,  
> and the checkpoint test won't execute until the entire attachment is  
> downloaded into memory.  There are ways around this, e.g. using  
> ibrowse:spawn_link_worker_process/2 to bypass the default connection  
> pool for attachment downloads.

Requiring every attachment to be started but not completed seems to me  
to be a fundamental issue.

In my case, I have some large attachments and unreliable links, so I'm  
partial to a solution that allows progress even of partial attachments  
during link failure. We could get this by not delaying the  
attachments, and buffering them to disk, using range requests on the  
get for partial downloads. This would solve some problems because it  
starts with the requirement to always make progress, never redoing  
work. This seems like it could be done reasonably transparently just  
by modifying the attachment download code.

Antony Blakey
CTO, Linkuistics Pty Ltd
Ph: 0438 840 787

Nothing is really work unless you would rather be doing something else.
   -- J. M. Barre

View raw message