couchdb-replication mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Calvin Metcalf <calvin.metc...@gmail.com>
Subject Re: Checkpointing on read only databases
Date Tue, 15 Apr 2014 18:54:17 GMT
I think the problem is not as much deleting and recreating a database but
wiping a virtual machine and restoring from a backup, now you have more or
less gone back in time with the target database and it has different stuff
but the same uuid.


On Tue, Apr 15, 2014 at 2:32 PM, Dale Harvey <dale@arandomurl.com> wrote:

> I dont understand the problem with per db uuids, so the uuid isnt
> multivalued nor is it queried
>
>    A is readyonly, B is client, B starts replication from A
>    B reads the db uuid from A / itself, generates a replication_id, stores
> on B
>    try to fetch replication checkpoint, if successful we query changes from
> since?
>
> In pouch we store the uuid along with the data, so file based backups arent
> a problem, seems couchdb could / should do that too
>
> This also fixes the problem mentioned on the mailing list, and one I have
> run into personally where people forward db requests but not server
> requests via a proxy
>
>
> On 15 April 2014 19:18, Calvin Metcalf <calvin.metcalf@gmail.com> wrote:
>
> > except there is no way to calculate that from outside the database as
> > changes only ever gives the more recent document version.
> >
> >
> > On Sun, Apr 13, 2014 at 9:47 PM, Calvin Metcalf <
> calvin.metcalf@gmail.com
> > >wrote:
> >
> > > oo didn't think of that, yeah uuids wouldn't hurt, though the more I
> > think
> > > about the rolling hashing on revs, the more I like that
> > >
> > >
> > > On Sun, Apr 13, 2014 at 6:00 PM, Adam Kocoloski <
> > adam.kocoloski@gmail.com>wrote:
> > >
> > >> Yes, but then sysadmins have to be very very careful about restoring
> > from
> > >> a file-based backup. We run the risk that {uuid, seq} could be
> > >> multi-valued, which diminishes its value considerably.
> > >>
> > >> I like the UUID in general -- we've added them to our internal shard
> > >> files at Cloudant -- but on their own they're not a bulletproof
> solution
> > >> for read-only incremental replications.
> > >>
> > >> Adam
> > >>
> > >> > On Apr 13, 2014, at 5:16 PM, Calvin Metcalf <
> calvin.metcalf@gmail.com
> > >
> > >> wrote:
> > >> >
> > >> > I mean if your going to add new features to couch you could just
> have
> > >> the
> > >> > db generate a random uuid on creation that would be different if it
> > was
> > >> > deleted and recreated
> > >> >> On Apr 13, 2014 1:59 PM, "Adam Kocoloski" <
> adam.kocoloski@gmail.com>
> > >> wrote:
> > >> >>
> > >> >> Other thoughts:
> > >> >>
> > >> >> - We could enhance the authorization system to have a role that
> > allows
> > >> >> updates to _local docs but nothing else. It wouldn't make sense
for
> > >> >> completely untrusted peers, but it could give peace of mind to
> > >> sysadmins
> > >> >> trying to execute replications with the minimum level of access
> > >> possible.
> > >> >>
> > >> >> - We could teach the sequence index to maintain a report of rolling
> > >> hash
> > >> >> of the {id,rev} pairs that comprise the database up to that
> sequence,
> > >> >> record that in the replication checkpoint document, and check
that
> > it's
> > >> >> unchanged on resume. It's a new API enhancement and it grows the
> > >> amount of
> > >> >> information stored with each sequence, but it completely closes
off
> > the
> > >> >> probabilistic edge case associated with simply checking that the
> {id,
> > >> rev}
> > >> >> associated with the checkpointed sequence has not changed. Perhaps
> > >> overkill
> > >> >> for what is admittedly a pretty low-probability event.
> > >> >>
> > >> >> Adam
> > >> >>
> > >> >> On Apr 13, 2014, at 1:50 PM, Adam Kocoloski <
> > adam.kocoloski@gmail.com>
> > >> >> wrote:
> > >> >>
> > >> >>> Yeah, this is a subtle little thing. The main reason we checkpoint
> > on
> > >> >> both source and target and compare is to cover the case where
the
> > >> source
> > >> >> database is deleted and recreated in between replication attempts.
> If
> > >> that
> > >> >> were to happen and the replicator just resumes blindly from the
> > >> checkpoint
> > >> >> sequence stored on the target then the replication could
> permanently
> > >> miss
> > >> >> some documents written to the new source.
> > >> >>>
> > >> >>> I'd love to have a robust solution for incremental replication
of
> > >> >> read-only databases. To first order a UUID on the source database
> > that
> > >> was
> > >> >> fixed at create time could do the trick, but we'll run into trouble
> > >> with
> > >> >> file-based backup and restores. If a database file is restored
to a
> > >> point
> > >> >> before the latest replication checkpoint we'd again be in a
> position
> > of
> > >> >> potentially permanently missing updates.
> > >> >>>
> > >> >>> Calvin's suggestion of storing e.g. {seq, id, rev} instead
of
> simply
> > >> seq
> > >> >> as the checkpoint information would dramatically reduce the
> > likelihood
> > >> of
> > >> >> that type of permanent skip in the replication, but it's only
a
> > >> >> probabilistic answer.
> > >> >>>
> > >> >>> Adam
> > >> >>>
> > >> >>>> On Apr 13, 2014, at 1:31 PM, Calvin Metcalf <
> > >> calvin.metcalf@gmail.com>
> > >> >>> wrote:
> > >> >>>
> > >> >>>> Though currently we have the opposite problem right if
we delete
> > the
> > >> >> target
> > >> >>>> db? (this on me brain storming)
> > >> >>>>
> > >> >>>> Could we store last rev in addition to last seq?
> > >> >>>>> On Apr 13, 2014 1:15 PM, "Dale Harvey" <dale@arandomurl.com>
> > wrote:
> > >> >>>>>
> > >> >>>>> If the src database was to be wiped, when we restarted
> replication
> > >> >> nothing
> > >> >>>>> would happen until the source database caught up to
the
> previously
> > >> >> written
> > >> >>>>> checkpoint
> > >> >>>>>
> > >> >>>>> create A, write 5 documents
> > >> >>>>> replicate 5 documents A -> B, write checkpoint
5 on B
> > >> >>>>> destroy A
> > >> >>>>> write 4 documents
> > >> >>>>> replicate A -> B, pick up checkpoint from B and
to ?since=5
> > >> >>>>> .. no documents written
> > >> >>
> > >>
> >
> https://github.com/pouchdb/pouchdb/blob/master/tests/test.replication.js#L771is
> > >> >>>>> our test that covers it
> > >> >>>>>
> > >> >>>>>
> > >> >>>>> On 13 April 2014 18:02, Calvin Metcalf <
> calvin.metcalf@gmail.com>
> > >> >> wrote:
> > >> >>>>>
> > >> >>>>>> If we were to unilaterally switch to checkpoint
on target what
> > >> would
> > >> >>>>>> happen, replication in progress would loose their
place?
> > >> >>>>>>> On Apr 13, 2014 11:21 AM, "Dale Harvey" <dale@arandomurl.com>
> > >> wrote:
> > >> >>>>>>>
> > >> >>>>>>> So with checkpointing we write the checkpoint
to both A and B
> > and
> > >> >>>>> verify
> > >> >>>>>>> they match before using the checkpoint
> > >> >>>>>>>
> > >> >>>>>>> What happens if the src of the replication
is read only?
> > >> >>>>>>>
> > >> >>>>>>> As far as I can tell couch will just checkout
a
> > >> >> checkpoint_commit_error
> > >> >>>>>> and
> > >> >>>>>>> carry on from the start, The only improvement
I can think of
> is
> > >> the
> > >> >>>>> user
> > >> >>>>>>> specifies they know the src is read only and
to only use the
> > >> target
> > >> >>>>>>> checkpoint, we can 'possibly' make that happen
automatically
> if
> > >> the
> > >> >> src
> > >> >>>>>>> specifically fails the write due to permissions.
> > >> >>
> > >> >>
> > >>
> > >
> > >
> > >
> > > --
> > > -Calvin W. Metcalf
> > >
> >
> >
> >
> > --
> > -Calvin W. Metcalf
> >
>



-- 
-Calvin W. Metcalf

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message