couchdb-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Adam Kocoloski <>
Subject Re: Attachment Replication Problem - Bug Found
Date Sat, 16 May 2009 15:07:16 GMT
Hi Antony,

On May 16, 2009, at 10:39 AM, Antony Blakey wrote:

> I can confirm that the target and source of replicated resources  
> affected by this issue are identical with this fix, and both are  
> correct i.e. uncorrupted, although this is only according to the  
> failures I've seen.

Thanks!  Makes me feel better, at least.

>> Now, on to the checkpointing conditions.  I think there's some  
>> confusion about the attachment workflow.  Attachments are  
>> downloaded _immediately_ and in their entirety by ibrowse, which  
>> then sends the data as 1MB binary chunks to the attachment receiver  
>> processes.
> Are they downloaded to disk by ibrowse?

No, I don't believe so.  ibrowse accepts a {stream_to, pid()} option.   
It accumulates packets until it reaches a threshold configurable by  
{stream_chunk_size, integer()} (default 1MB), then sends the data to  
the Pid.  I don't think ibrowse is writing to disk at any point  in  
the process.  We do see that when streaming really large attachments,  
ibrowse becomes the biggest memory user in the emulator.

ibrowse does offer a {save_response_to_file, boolean()|filename()}  
option that we could possibly leverage.

>> In another thread Matt Goodall suggested checkpointing after a  
>> certain amount of time has passed.  So we'd have a checkpointing  
>> algo that considers
>> * memory utilization
>> * number of pending writes
>> * time elapsed
> That seems to cover both resource usage and incremental progress. As  
> far as the couch_util:should_flush mechanism is concerned, I think a  
> good idea would be to commit 1 document, then 2, then 4 i.e. a  
> binary increasing window which adapts well to both unreliable and  
> reliable connections without requiring configuration, which is  
> tricky because you may want to run the system in a variety of  
> scenarios, and you might not know what the failure characteristics  
> are (and they may change over time).

It sounds like a good idea.  I had thought about doing the same for  
the process that pulls new docs from the source server, so that we  
could do a better job of filling up the pipes when we're dealing with  
the common case of small documents without significant attachment data.

> While we on this - any idea about why couchdb is quiting during  
> replication? It's not giving me any errors.

Errm, no, I'm afraid I don't have any idea there.  I remember one or  
two other reports in JIRA that sounds similar, but I've not been able  
to reproduce them.  Are you keeping an eye on the memory usage?  I  
think an out of memory error can trigger this sudden death in Erlang.   
Sorry, that's the best I've got at the moment.


View raw message