couchdb-replication mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Adam Kocoloski <>
Subject Re: Checkpointing on read only databases
Date Sun, 13 Apr 2014 17:58:44 GMT
Other thoughts:

- We could enhance the authorization system to have a role that allows updates to _local docs
but nothing else. It wouldn't make sense for completely untrusted peers, but it could give
peace of mind to sysadmins trying to execute replications with the minimum level of access

- We could teach the sequence index to maintain a report of rolling hash of the {id,rev} pairs
that comprise the database up to that sequence, record that in the replication checkpoint
document, and check that it's unchanged on resume. It's a new API enhancement and it grows
the amount of information stored with each sequence, but it completely closes off the probabilistic
edge case associated with simply checking that the {id, rev} associated with the checkpointed
sequence has not changed. Perhaps overkill for what is admittedly a pretty low-probability


On Apr 13, 2014, at 1:50 PM, Adam Kocoloski <> wrote:

> Yeah, this is a subtle little thing. The main reason we checkpoint on both source and
target and compare is to cover the case where the source database is deleted and recreated
in between replication attempts. If that were to happen and the replicator just resumes blindly
from the checkpoint sequence stored on the target then the replication could permanently miss
some documents written to the new source.
> I'd love to have a robust solution for incremental replication of read-only databases.
To first order a UUID on the source database that was fixed at create time could do the trick,
but we'll run into trouble with file-based backup and restores. If a database file is restored
to a point before the latest replication checkpoint we'd again be in a position of potentially
permanently missing updates.
> Calvin's suggestion of storing e.g. {seq, id, rev} instead of simply seq as the checkpoint
information would dramatically reduce the likelihood of that type of permanent skip in the
replication, but it's only a probabilistic answer.
> Adam
> On Apr 13, 2014, at 1:31 PM, Calvin Metcalf <> wrote:
>> Though currently we have the opposite problem right if we delete the target
>> db? (this on me brain storming)
>> Could we store last rev in addition to last seq?
>> On Apr 13, 2014 1:15 PM, "Dale Harvey" <> wrote:
>>> If the src database was to be wiped, when we restarted replication nothing
>>> would happen until the source database caught up to the previously written
>>> checkpoint
>>>  create A, write 5 documents
>>>  replicate 5 documents A -> B, write checkpoint 5 on B
>>>  destroy A
>>>  write 4 documents
>>>  replicate A -> B, pick up checkpoint from B and to ?since=5
>>>  .. no documents written
>>> our test that covers it
>>> On 13 April 2014 18:02, Calvin Metcalf <> wrote:
>>>> If we were to unilaterally switch to checkpoint on target what would
>>>> happen, replication in progress would loose their place?
>>>> On Apr 13, 2014 11:21 AM, "Dale Harvey" <> wrote:
>>>>> So with checkpointing we write the checkpoint to both A and B and
>>> verify
>>>>> they match before using the checkpoint
>>>>> What happens if the src of the replication is read only?
>>>>> As far as I can tell couch will just checkout a checkpoint_commit_error
>>>> and
>>>>> carry on from the start, The only improvement I can think of is the
>>> user
>>>>> specifies they know the src is read only and to only use the target
>>>>> checkpoint, we can 'possibly' make that happen automatically if the src
>>>>> specifically fails the write due to permissions.

View raw message