couchdb-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jan Lehnardt <...@apache.org>
Subject Re: replication usage? creating dupes?
Date Wed, 16 Jul 2008 18:01:05 GMT

On Jul 16, 2008, at 19:06, Damien Katz wrote:

>
> On Jul 16, 2008, at 12:05 PM, Paul Davis wrote:
>
>> I haven't really gotten into replication yet, but did I read that
>> right? The browser request for compaction isn't expected to return
>> until replication has completed? On the surface of things that seems
>> fairly ungood. What happens in the future when I have a multi  
>> gigabyte
>> database I want to replicate from scratch to a new node over a slow
>> connection?
>>
>> If I'm not completely off my rocker, perhaps a better solution is  
>> that
>> the browser request for replication returns immediately and then
>> couchdb would provide a method for checking on the status of the
>> replication.
>
> That's an option for the future, but then we'd have to create a  
> replication task monitor infrastructure that can be queried from  
> HTTP. That's additional complexity and overhead.
>
> If you want to fire and forget, then that's an easy option. But if  
> you want to fire it off, and monitor it, and shut it down, then all  
> that has to be written and tested and documented.
>
> Instead, we can just have a synchronous HTTP request and get most of  
> that for free. If the request is alive, you know the replication is  
> still running. If you want to kill it, then terminate the  
> connection. If you want to know when its done, then simply wait for  
> the completion. We could even send updates about progress of the  
> replication over chunked HTTP.
>
> So for now I think we stick with the simpler design and enhance it  
> to do what we want, until we hit the wall and need to build  
> something bigger.

I agree with Chris and Damien :)

For now it might be good to put a disclaimer into Futon that
says you need to wait for the replication to finish or, not to
use Futon at all for longer replications and curl et. al. for that.

Cmlenz? :)

Cheers
Jan
--


>
>
> -Damien
>
>>
>>
>> Feel free to ignore me if I have this completely wrong.
>>
>> Paul
>>
>> On Wed, Jul 16, 2008 at 11:58 AM, Damien Katz <damien@apache.org>  
>> wrote:
>>> That problem is likely due to the fact the user HTTP request is  
>>> timing out
>>> while waiting for the replication to complete, that in turn kills  
>>> the
>>> underlying replication process. Restarting the replication will  
>>> usually help
>>> as CouchDB avoids sending the same document twice, but if the  
>>> replication is
>>> exceptionally long it might not get past the point where it it  
>>> finishing
>>> examining the documents.
>>>
>>> The problem is its only saves off the replication record once it  
>>> completes
>>> successfully, so until it completes it always examine the same  
>>> number of
>>> documents to see if they exist on the target replica. The fix I  
>>> need to
>>> implement is to have it save off the replication record every x  
>>> seconds
>>> during replication, then if it dies unexpectedly it will pick back  
>>> up from
>>> the last replication record, reducing the number of documents  
>>> needing to be
>>> reexamined.
>>>
>>> Then we need to solve is the current problem of synchronous HTTP  
>>> request to
>>> perform the replication. In Futon, the browser doesn't do the  
>>> replication,
>>> it just sends a single replication request to the CouchDB server.  
>>> A CouchDB
>>> Erlang process then performs the replication, accessing database  
>>> either
>>> locally or via HTTP on other Erlang servers. Right now, the  
>>> browser can
>>> timeout the HTTP request during a long replication, that in turn  
>>> kills the
>>> replication process.
>>>
>>> There are two potential solutions here, the first is to send a  
>>> browser ping
>>> to keep the connection alive. Easy do do with HTTP 1.1 I think,  
>>> just send an
>>> empty HTTP chunk. The second is to make it impossible for the  
>>> broken HTTP
>>> request to kill the replication request. They aren't mutually  
>>> exclusive, but
>>> the more I think about it, the more I dislike the second solution.
>>>
>>> -Damien
>>>
>>>
>>> On Jul 16, 2008, at 11:13 AM, Chris Anderson wrote:
>>>
>>>> On Wed, Jul 16, 2008 at 2:18 AM, Jan Lehnardt <jan@apache.org>  
>>>> wrote:
>>>>>
>>>>> I'm surprised that his wasn't reported earlier. CouchDB  
>>>>> replication
>>>>> is supposed to be reliable (when we got all the bugs out), so an
>>>>> external replication thing should not be necessary. I would have
>>>>> guessed that reporting this is easier than writing code to  
>>>>> circumvent
>>>>> the problem. This should be fixed in CouchDB and not worked
>>>>> around.
>>>>
>>>> My experience with replication has been that it works flawlessly  
>>>> for
>>>> smaller datasets, and as the dataset grows, it either starts to  
>>>> take
>>>> so long it may as well be broken (but shows no errors in the log)  
>>>> or
>>>> occasionally does the =ERROR REPORT==== thing in the log. The  
>>>> later is
>>>> a new symptom in my experience.
>>>>
>>>> I haven't had a chance to bring my install up to latest trunk, so I
>>>> hesitated to report it. Today's my only sane day for a couple of  
>>>> weeks
>>>> on each side, so I'll see what progress I can make.
>>>>
>>>> Chris
>>>>
>>>>
>>>> --
>>>> Chris Anderson
>>>> http://jchris.mfdz.com
>>>
>>>
>
>


Mime
View raw message