couchdb-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sho Fukamachi <>
Subject Re: replication error
Date Sun, 01 Feb 2009 18:27:03 GMT

On 02/02/2009, at 5:01 AM, Adam Kocoloski wrote:

>> [...]
> That's odd.  I tried setting a 120 second timeout and didn't have  
> any trouble.  Then again, I only ran the test suite; I didn't  
> actually force a timeout to occur or anything.  Sorry, I don't have  
> any hints at the moment.

Guh. I'm an idiot. I'd forgotten to create the destination database.  
IIn my haste to test it I used futon, not my normal script, and of  
course, interpreted the error as something with the code I'd changed.

Sorry about that. : /

With the changes, it worked first time .. although did give a spurious  
error about how a server had restarted.

> Multipart won't solve the problem where ibrowse throws a timeout  
> error even while it's still sending data.  That seems like a pretty  
> curious choice on ibrowse' part to me.  Maybe when I have some more  
> free time I can look into the timeout algo and see if it can be  
> tweaked so that it only starts after the request has been fully  
> transmitted.  I think that would pretty much solve this problem.   
> Barring that, I agree that some sort of back-off algorithm that  
> lengthens the timeout after each failed request is warranted.
> There's also one more knob we can turn.  During replication we are  
> checking the memory consumption of the process collecting docs to  
> send to the target.  If it hits 10MB we send the bulk immediately,  
> regardless of whether it's 1 doc, 10, or 99.  10MB may be much too  
> high given a 30 second timeout window in which we have to transmit  
> the data; 1MB is possibly a better fit for home broadband users.  If  
> you want to fiddle with that knob instead of the ibrowse timeout you  
> can try changing line 224 of couch_rep.erl so that instead of
> couch_util:should_flush()
> it would read (value is in bytes)
> couch_util:should_flush(1000000)

Awesome tip. Thanks. Yeah, I had never noticed any problem with server  
to server replication... only when I then tried to do it from home...

> I don't have a strong opinion at this point in time about how many  
> of these parameters ought to be tunable in local.ini.  Best,

My opinion is usually that pretty much everything with a big effect,  
like this, should have a sensible default, but overrideable in config.  
Failing that, maybe the default timeout should be raised?

Thanks heaps for your help.


View raw message