couchdb-replication mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Dale Harvey <d...@arandomurl.com>
Subject Re: Checkpointing on read only databases
Date Wed, 16 Apr 2014 00:57:29 GMT
Sorry still dont understand the problem here

The uuid is stored inside the database file, you either have the same data
and the same uuid, or none of them?


On 15 April 2014 19:54, Calvin Metcalf <calvin.metcalf@gmail.com> wrote:

> I think the problem is not as much deleting and recreating a database but
> wiping a virtual machine and restoring from a backup, now you have more or
> less gone back in time with the target database and it has different stuff
> but the same uuid.
>
>
> On Tue, Apr 15, 2014 at 2:32 PM, Dale Harvey <dale@arandomurl.com> wrote:
>
> > I dont understand the problem with per db uuids, so the uuid isnt
> > multivalued nor is it queried
> >
> >    A is readyonly, B is client, B starts replication from A
> >    B reads the db uuid from A / itself, generates a replication_id,
> stores
> > on B
> >    try to fetch replication checkpoint, if successful we query changes
> from
> > since?
> >
> > In pouch we store the uuid along with the data, so file based backups
> arent
> > a problem, seems couchdb could / should do that too
> >
> > This also fixes the problem mentioned on the mailing list, and one I have
> > run into personally where people forward db requests but not server
> > requests via a proxy
> >
> >
> > On 15 April 2014 19:18, Calvin Metcalf <calvin.metcalf@gmail.com> wrote:
> >
> > > except there is no way to calculate that from outside the database as
> > > changes only ever gives the more recent document version.
> > >
> > >
> > > On Sun, Apr 13, 2014 at 9:47 PM, Calvin Metcalf <
> > calvin.metcalf@gmail.com
> > > >wrote:
> > >
> > > > oo didn't think of that, yeah uuids wouldn't hurt, though the more I
> > > think
> > > > about the rolling hashing on revs, the more I like that
> > > >
> > > >
> > > > On Sun, Apr 13, 2014 at 6:00 PM, Adam Kocoloski <
> > > adam.kocoloski@gmail.com>wrote:
> > > >
> > > >> Yes, but then sysadmins have to be very very careful about restoring
> > > from
> > > >> a file-based backup. We run the risk that {uuid, seq} could be
> > > >> multi-valued, which diminishes its value considerably.
> > > >>
> > > >> I like the UUID in general -- we've added them to our internal shard
> > > >> files at Cloudant -- but on their own they're not a bulletproof
> > solution
> > > >> for read-only incremental replications.
> > > >>
> > > >> Adam
> > > >>
> > > >> > On Apr 13, 2014, at 5:16 PM, Calvin Metcalf <
> > calvin.metcalf@gmail.com
> > > >
> > > >> wrote:
> > > >> >
> > > >> > I mean if your going to add new features to couch you could just
> > have
> > > >> the
> > > >> > db generate a random uuid on creation that would be different
if
> it
> > > was
> > > >> > deleted and recreated
> > > >> >> On Apr 13, 2014 1:59 PM, "Adam Kocoloski" <
> > adam.kocoloski@gmail.com>
> > > >> wrote:
> > > >> >>
> > > >> >> Other thoughts:
> > > >> >>
> > > >> >> - We could enhance the authorization system to have a role
that
> > > allows
> > > >> >> updates to _local docs but nothing else. It wouldn't make
sense
> for
> > > >> >> completely untrusted peers, but it could give peace of mind
to
> > > >> sysadmins
> > > >> >> trying to execute replications with the minimum level of
access
> > > >> possible.
> > > >> >>
> > > >> >> - We could teach the sequence index to maintain a report
of
> rolling
> > > >> hash
> > > >> >> of the {id,rev} pairs that comprise the database up to that
> > sequence,
> > > >> >> record that in the replication checkpoint document, and check
> that
> > > it's
> > > >> >> unchanged on resume. It's a new API enhancement and it grows
the
> > > >> amount of
> > > >> >> information stored with each sequence, but it completely
closes
> off
> > > the
> > > >> >> probabilistic edge case associated with simply checking that
the
> > {id,
> > > >> rev}
> > > >> >> associated with the checkpointed sequence has not changed.
> Perhaps
> > > >> overkill
> > > >> >> for what is admittedly a pretty low-probability event.
> > > >> >>
> > > >> >> Adam
> > > >> >>
> > > >> >> On Apr 13, 2014, at 1:50 PM, Adam Kocoloski <
> > > adam.kocoloski@gmail.com>
> > > >> >> wrote:
> > > >> >>
> > > >> >>> Yeah, this is a subtle little thing. The main reason
we
> checkpoint
> > > on
> > > >> >> both source and target and compare is to cover the case where
the
> > > >> source
> > > >> >> database is deleted and recreated in between replication
> attempts.
> > If
> > > >> that
> > > >> >> were to happen and the replicator just resumes blindly from
the
> > > >> checkpoint
> > > >> >> sequence stored on the target then the replication could
> > permanently
> > > >> miss
> > > >> >> some documents written to the new source.
> > > >> >>>
> > > >> >>> I'd love to have a robust solution for incremental replication
> of
> > > >> >> read-only databases. To first order a UUID on the source
database
> > > that
> > > >> was
> > > >> >> fixed at create time could do the trick, but we'll run into
> trouble
> > > >> with
> > > >> >> file-based backup and restores. If a database file is restored
> to a
> > > >> point
> > > >> >> before the latest replication checkpoint we'd again be in
a
> > position
> > > of
> > > >> >> potentially permanently missing updates.
> > > >> >>>
> > > >> >>> Calvin's suggestion of storing e.g. {seq, id, rev} instead
of
> > simply
> > > >> seq
> > > >> >> as the checkpoint information would dramatically reduce the
> > > likelihood
> > > >> of
> > > >> >> that type of permanent skip in the replication, but it's
only a
> > > >> >> probabilistic answer.
> > > >> >>>
> > > >> >>> Adam
> > > >> >>>
> > > >> >>>> On Apr 13, 2014, at 1:31 PM, Calvin Metcalf <
> > > >> calvin.metcalf@gmail.com>
> > > >> >>> wrote:
> > > >> >>>
> > > >> >>>> Though currently we have the opposite problem right
if we
> delete
> > > the
> > > >> >> target
> > > >> >>>> db? (this on me brain storming)
> > > >> >>>>
> > > >> >>>> Could we store last rev in addition to last seq?
> > > >> >>>>> On Apr 13, 2014 1:15 PM, "Dale Harvey" <dale@arandomurl.com>
> > > wrote:
> > > >> >>>>>
> > > >> >>>>> If the src database was to be wiped, when we
restarted
> > replication
> > > >> >> nothing
> > > >> >>>>> would happen until the source database caught
up to the
> > previously
> > > >> >> written
> > > >> >>>>> checkpoint
> > > >> >>>>>
> > > >> >>>>> create A, write 5 documents
> > > >> >>>>> replicate 5 documents A -> B, write checkpoint
5 on B
> > > >> >>>>> destroy A
> > > >> >>>>> write 4 documents
> > > >> >>>>> replicate A -> B, pick up checkpoint from
B and to ?since=5
> > > >> >>>>> .. no documents written
> > > >> >>
> > > >>
> > >
> >
> https://github.com/pouchdb/pouchdb/blob/master/tests/test.replication.js#L771is
> > > >> >>>>> our test that covers it
> > > >> >>>>>
> > > >> >>>>>
> > > >> >>>>> On 13 April 2014 18:02, Calvin Metcalf <
> > calvin.metcalf@gmail.com>
> > > >> >> wrote:
> > > >> >>>>>
> > > >> >>>>>> If we were to unilaterally switch to checkpoint
on target
> what
> > > >> would
> > > >> >>>>>> happen, replication in progress would loose
their place?
> > > >> >>>>>>> On Apr 13, 2014 11:21 AM, "Dale Harvey"
<
> dale@arandomurl.com>
> > > >> wrote:
> > > >> >>>>>>>
> > > >> >>>>>>> So with checkpointing we write the checkpoint
to both A and
> B
> > > and
> > > >> >>>>> verify
> > > >> >>>>>>> they match before using the checkpoint
> > > >> >>>>>>>
> > > >> >>>>>>> What happens if the src of the replication
is read only?
> > > >> >>>>>>>
> > > >> >>>>>>> As far as I can tell couch will just
checkout a
> > > >> >> checkpoint_commit_error
> > > >> >>>>>> and
> > > >> >>>>>>> carry on from the start, The only improvement
I can think of
> > is
> > > >> the
> > > >> >>>>> user
> > > >> >>>>>>> specifies they know the src is read only
and to only use the
> > > >> target
> > > >> >>>>>>> checkpoint, we can 'possibly' make that
happen automatically
> > if
> > > >> the
> > > >> >> src
> > > >> >>>>>>> specifically fails the write due to permissions.
> > > >> >>
> > > >> >>
> > > >>
> > > >
> > > >
> > > >
> > > > --
> > > > -Calvin W. Metcalf
> > > >
> > >
> > >
> > >
> > > --
> > > -Calvin W. Metcalf
> > >
> >
>
>
>
> --
> -Calvin W. Metcalf
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message