couchdb-replication mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Calvin Metcalf <calvin.metc...@gmail.com>
Subject Re: Checkpointing on read only databases
Date Tue, 15 Apr 2014 18:18:04 GMT
except there is no way to calculate that from outside the database as
changes only ever gives the more recent document version.


On Sun, Apr 13, 2014 at 9:47 PM, Calvin Metcalf <calvin.metcalf@gmail.com>wrote:

> oo didn't think of that, yeah uuids wouldn't hurt, though the more I think
> about the rolling hashing on revs, the more I like that
>
>
> On Sun, Apr 13, 2014 at 6:00 PM, Adam Kocoloski <adam.kocoloski@gmail.com>wrote:
>
>> Yes, but then sysadmins have to be very very careful about restoring from
>> a file-based backup. We run the risk that {uuid, seq} could be
>> multi-valued, which diminishes its value considerably.
>>
>> I like the UUID in general -- we've added them to our internal shard
>> files at Cloudant -- but on their own they're not a bulletproof solution
>> for read-only incremental replications.
>>
>> Adam
>>
>> > On Apr 13, 2014, at 5:16 PM, Calvin Metcalf <calvin.metcalf@gmail.com>
>> wrote:
>> >
>> > I mean if your going to add new features to couch you could just have
>> the
>> > db generate a random uuid on creation that would be different if it was
>> > deleted and recreated
>> >> On Apr 13, 2014 1:59 PM, "Adam Kocoloski" <adam.kocoloski@gmail.com>
>> wrote:
>> >>
>> >> Other thoughts:
>> >>
>> >> - We could enhance the authorization system to have a role that allows
>> >> updates to _local docs but nothing else. It wouldn't make sense for
>> >> completely untrusted peers, but it could give peace of mind to
>> sysadmins
>> >> trying to execute replications with the minimum level of access
>> possible.
>> >>
>> >> - We could teach the sequence index to maintain a report of rolling
>> hash
>> >> of the {id,rev} pairs that comprise the database up to that sequence,
>> >> record that in the replication checkpoint document, and check that it's
>> >> unchanged on resume. It's a new API enhancement and it grows the
>> amount of
>> >> information stored with each sequence, but it completely closes off the
>> >> probabilistic edge case associated with simply checking that the {id,
>> rev}
>> >> associated with the checkpointed sequence has not changed. Perhaps
>> overkill
>> >> for what is admittedly a pretty low-probability event.
>> >>
>> >> Adam
>> >>
>> >> On Apr 13, 2014, at 1:50 PM, Adam Kocoloski <adam.kocoloski@gmail.com>
>> >> wrote:
>> >>
>> >>> Yeah, this is a subtle little thing. The main reason we checkpoint on
>> >> both source and target and compare is to cover the case where the
>> source
>> >> database is deleted and recreated in between replication attempts. If
>> that
>> >> were to happen and the replicator just resumes blindly from the
>> checkpoint
>> >> sequence stored on the target then the replication could permanently
>> miss
>> >> some documents written to the new source.
>> >>>
>> >>> I'd love to have a robust solution for incremental replication of
>> >> read-only databases. To first order a UUID on the source database that
>> was
>> >> fixed at create time could do the trick, but we'll run into trouble
>> with
>> >> file-based backup and restores. If a database file is restored to a
>> point
>> >> before the latest replication checkpoint we'd again be in a position of
>> >> potentially permanently missing updates.
>> >>>
>> >>> Calvin's suggestion of storing e.g. {seq, id, rev} instead of simply
>> seq
>> >> as the checkpoint information would dramatically reduce the likelihood
>> of
>> >> that type of permanent skip in the replication, but it's only a
>> >> probabilistic answer.
>> >>>
>> >>> Adam
>> >>>
>> >>>> On Apr 13, 2014, at 1:31 PM, Calvin Metcalf <
>> calvin.metcalf@gmail.com>
>> >>> wrote:
>> >>>
>> >>>> Though currently we have the opposite problem right if we delete
the
>> >> target
>> >>>> db? (this on me brain storming)
>> >>>>
>> >>>> Could we store last rev in addition to last seq?
>> >>>>> On Apr 13, 2014 1:15 PM, "Dale Harvey" <dale@arandomurl.com>
wrote:
>> >>>>>
>> >>>>> If the src database was to be wiped, when we restarted replication
>> >> nothing
>> >>>>> would happen until the source database caught up to the previously
>> >> written
>> >>>>> checkpoint
>> >>>>>
>> >>>>> create A, write 5 documents
>> >>>>> replicate 5 documents A -> B, write checkpoint 5 on B
>> >>>>> destroy A
>> >>>>> write 4 documents
>> >>>>> replicate A -> B, pick up checkpoint from B and to ?since=5
>> >>>>> .. no documents written
>> >>
>> https://github.com/pouchdb/pouchdb/blob/master/tests/test.replication.js#L771is
>> >>>>> our test that covers it
>> >>>>>
>> >>>>>
>> >>>>> On 13 April 2014 18:02, Calvin Metcalf <calvin.metcalf@gmail.com>
>> >> wrote:
>> >>>>>
>> >>>>>> If we were to unilaterally switch to checkpoint on target
what
>> would
>> >>>>>> happen, replication in progress would loose their place?
>> >>>>>>> On Apr 13, 2014 11:21 AM, "Dale Harvey" <dale@arandomurl.com>
>> wrote:
>> >>>>>>>
>> >>>>>>> So with checkpointing we write the checkpoint to both
A and B and
>> >>>>> verify
>> >>>>>>> they match before using the checkpoint
>> >>>>>>>
>> >>>>>>> What happens if the src of the replication is read only?
>> >>>>>>>
>> >>>>>>> As far as I can tell couch will just checkout a
>> >> checkpoint_commit_error
>> >>>>>> and
>> >>>>>>> carry on from the start, The only improvement I can
think of is
>> the
>> >>>>> user
>> >>>>>>> specifies they know the src is read only and to only
use the
>> target
>> >>>>>>> checkpoint, we can 'possibly' make that happen automatically
if
>> the
>> >> src
>> >>>>>>> specifically fails the write due to permissions.
>> >>
>> >>
>>
>
>
>
> --
> -Calvin W. Metcalf
>



-- 
-Calvin W. Metcalf

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message