couchdb-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Adam Kocoloski <adam.kocolo...@gmail.com>
Subject Re: handling simultaneous identical replications
Date Thu, 05 Mar 2009 13:00:52 GMT
On Mar 5, 2009, at 7:24 AM, Jan Lehnardt wrote:

>
> On 5 Mar 2009, at 07:31, Paul Davis wrote:
>
>> On Wed, Mar 4, 2009 at 8:34 PM, Adam Kocoloski <adam.kocoloski@gmail.com 
>> > wrote:
>>> Hi folks, we've been running into a problem where multiple  
>>> replications with
>>> the same source and target are running simultaneously.  This  
>>> introduces
>>> quite a lot of unnecessary network traffic and causes real  
>>> problems with
>>> update collisions on the local replication history documents.  If  
>>> replicator
>>> A updates the source doc and replicator B updates the target doc,  
>>> subsequent
>>> replications will decide that a full replication is necessary.
>>>
>>> I have some ideas about how to ensure only one is running at a  
>>> time (more on
>>> that in a separate mail), but I'd like some feedback on how to  
>>> handle the
>>> second..Nth request.  Let's call the initial POST to _replicate  
>>> "A" and the
>>> second POST "B":
>>>
>>> Option 1 -- Respond to B with the results from A
>>> This option works fine if the source is remote.  However, if the  
>>> source is
>>> local, the replication started by A will be missing updates to the  
>>> source DB
>>> that occurred between A and B.  B may be surprised by that result.
>>>
>>> Option 2 -- Grab an updated DB and continue the replication
>>> This option will include updates to the source that occurred  
>>> between A and B
>>> in the response to both requests.
>>>
>>> Option 3 -- Respond to A, then trigger another replication for B
>>> In this case we wait till the replication started by A has  
>>> completed, then
>>> do an incremental one and respond to B with the results of that  
>>> incremental.
>>>
>>> I think I'd vote for 3.  Cheers, Adam
>>>
>>>
>>
>> If I follow this correctly, the issue is, "POST to _replicate, a
>> second POST to _replicate occurs before the first request finishes"
>> (with the same source/target info).
>>
>> My knowledge of replication is only cursory, but I could also see:
>>
>> Option 4:
>>
>> Same as views, we wait for replication to finish and return the same
>> result to all clients that made a request.
>
> I understand this and Adam's option 3 to be the same. What am I  
> missing? :)

No, not quite.  In Option 3 the two requesters get different  
responses.  A gets the result of the original request, B gets the  
result of the replication triggered automatically after the first one  
that replicates any updates to the DB which happened during the first  
pass.  If no updates occurred, B will receive the result of the first  
replication.

Paul's Option 4 is more like Options 1 and 2, where A and B get  
identical responses.  The difference between 1 and 2 is just whether  
new updates get included in that response.

Whew.

>> Option 5:
>>
>> Return an error on B that says, "Yeah, yeah. Already on it."
>
> This would make replication behave a bit like compaction.

Sort of, in that additional triggers are no-ops.  Option 1 also has  
that behavior.

> I think I like 3/4 best.
>
> Cheers
> Jan
> --


Best, Adam


Mime
View raw message