couchdb-replication mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Adam Kocoloski <adam.kocolo...@gmail.com>
Subject Re: Checkpointing on read only databases
Date Sun, 13 Apr 2014 22:00:38 GMT
Yes, but then sysadmins have to be very very careful about restoring from a file-based backup.
We run the risk that {uuid, seq} could be multi-valued, which diminishes its value considerably.

I like the UUID in general -- we've added them to our internal shard files at Cloudant --
but on their own they're not a bulletproof solution for read-only incremental replications.

Adam

> On Apr 13, 2014, at 5:16 PM, Calvin Metcalf <calvin.metcalf@gmail.com> wrote:
> 
> I mean if your going to add new features to couch you could just have the
> db generate a random uuid on creation that would be different if it was
> deleted and recreated
>> On Apr 13, 2014 1:59 PM, "Adam Kocoloski" <adam.kocoloski@gmail.com> wrote:
>> 
>> Other thoughts:
>> 
>> - We could enhance the authorization system to have a role that allows
>> updates to _local docs but nothing else. It wouldn't make sense for
>> completely untrusted peers, but it could give peace of mind to sysadmins
>> trying to execute replications with the minimum level of access possible.
>> 
>> - We could teach the sequence index to maintain a report of rolling hash
>> of the {id,rev} pairs that comprise the database up to that sequence,
>> record that in the replication checkpoint document, and check that it's
>> unchanged on resume. It's a new API enhancement and it grows the amount of
>> information stored with each sequence, but it completely closes off the
>> probabilistic edge case associated with simply checking that the {id, rev}
>> associated with the checkpointed sequence has not changed. Perhaps overkill
>> for what is admittedly a pretty low-probability event.
>> 
>> Adam
>> 
>> On Apr 13, 2014, at 1:50 PM, Adam Kocoloski <adam.kocoloski@gmail.com>
>> wrote:
>> 
>>> Yeah, this is a subtle little thing. The main reason we checkpoint on
>> both source and target and compare is to cover the case where the source
>> database is deleted and recreated in between replication attempts. If that
>> were to happen and the replicator just resumes blindly from the checkpoint
>> sequence stored on the target then the replication could permanently miss
>> some documents written to the new source.
>>> 
>>> I'd love to have a robust solution for incremental replication of
>> read-only databases. To first order a UUID on the source database that was
>> fixed at create time could do the trick, but we'll run into trouble with
>> file-based backup and restores. If a database file is restored to a point
>> before the latest replication checkpoint we'd again be in a position of
>> potentially permanently missing updates.
>>> 
>>> Calvin's suggestion of storing e.g. {seq, id, rev} instead of simply seq
>> as the checkpoint information would dramatically reduce the likelihood of
>> that type of permanent skip in the replication, but it's only a
>> probabilistic answer.
>>> 
>>> Adam
>>> 
>>>> On Apr 13, 2014, at 1:31 PM, Calvin Metcalf <calvin.metcalf@gmail.com>
>>> wrote:
>>> 
>>>> Though currently we have the opposite problem right if we delete the
>> target
>>>> db? (this on me brain storming)
>>>> 
>>>> Could we store last rev in addition to last seq?
>>>>> On Apr 13, 2014 1:15 PM, "Dale Harvey" <dale@arandomurl.com> wrote:
>>>>> 
>>>>> If the src database was to be wiped, when we restarted replication
>> nothing
>>>>> would happen until the source database caught up to the previously
>> written
>>>>> checkpoint
>>>>> 
>>>>> create A, write 5 documents
>>>>> replicate 5 documents A -> B, write checkpoint 5 on B
>>>>> destroy A
>>>>> write 4 documents
>>>>> replicate A -> B, pick up checkpoint from B and to ?since=5
>>>>> .. no documents written
>> https://github.com/pouchdb/pouchdb/blob/master/tests/test.replication.js#L771is
>>>>> our test that covers it
>>>>> 
>>>>> 
>>>>> On 13 April 2014 18:02, Calvin Metcalf <calvin.metcalf@gmail.com>
>> wrote:
>>>>> 
>>>>>> If we were to unilaterally switch to checkpoint on target what would
>>>>>> happen, replication in progress would loose their place?
>>>>>>> On Apr 13, 2014 11:21 AM, "Dale Harvey" <dale@arandomurl.com>
wrote:
>>>>>>> 
>>>>>>> So with checkpointing we write the checkpoint to both A and B
and
>>>>> verify
>>>>>>> they match before using the checkpoint
>>>>>>> 
>>>>>>> What happens if the src of the replication is read only?
>>>>>>> 
>>>>>>> As far as I can tell couch will just checkout a
>> checkpoint_commit_error
>>>>>> and
>>>>>>> carry on from the start, The only improvement I can think of
is the
>>>>> user
>>>>>>> specifies they know the src is read only and to only use the
target
>>>>>>> checkpoint, we can 'possibly' make that happen automatically
if the
>> src
>>>>>>> specifically fails the write due to permissions.
>> 
>> 

Mime
View raw message