couchdb-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Adam Kocoloski <kocol...@apache.org>
Subject Re: svn commit: r775507 - /couchdb/trunk/src/couchdb/couch_rep.erl
Date Sun, 17 May 2009 12:32:45 GMT
On May 16, 2009, at 8:44 PM, Antony Blakey wrote:

> On 17/05/2009, at 4:20 AM, Adam Kocoloski wrote:
>
>> Ok, so here's a start at reworking some of the memory management  
>> and buffering calculations.  It fixes the regression where  
>> attachment memory wasn't being included in the memory utilization  
>> numbers, and it also includes ibrowse memory utilization for  
>> attachments (which is larger than Couch's).
>>
>> The decision to flush the buffer (to disk or to the remote target  
>> server) is dependent on the number of docs in the buffer, the  
>> approximate number of attachments, and the memory utilization.  I  
>> estimate the number of attachments as 0.5*nlinks, since every  
>> attachment download spawns two processes: one dedicated ibrowse  
>> worker and the attachment receiver.  The dedicated ibrowse workers  
>> get the attachments out of the connection pool and let us keep a  
>> better eye on their memory usage.
>>
>> Each of the thresholds is currently just defined as a macro at the  
>> top of the module.  I haven't done any work on adjusting these  
>> thresholds dynamically or checkpointing as a function of elapsed  
>> time.
>>
>> The replication module is getting pretty hairy again; in my opinion  
>> its probably time to refactor out the attachment stuff into its own  
>> module.  I may get around to that tomorrow if no one objects.
>
> What do you think about adding binary backoff to help with  
> unreliable links? Even if attachments are buffered to disk there's  
> still the issue of making checkpoint progress in the face of link  
> failure. Or maybe checkpoint the buffer on any failure (although  
> that won't help the situation where couchdb quits).

We already try to checkpoint in the event of failure, but it doesn't  
really help much because the checkpoint record has to be saved on both  
source and target in order to be recognized the next time around.

I'm definitely in favor of adding code to help the replicator make  
progress in the face of link failure, whether that be some sort of  
backoff algorithm, caching and reusing partial attachment downloads,  
etc.

Best, Adam

Mime
View raw message