couchdb-replication mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Calvin Metcalf <calvin.metc...@gmail.com>
Subject Re: Checkpointing on read only databases
Date Tue, 15 Apr 2014 20:19:52 GMT
so if only leaf nodes are used to calculate the hash then it would be
opaque from outside the system as to how the number was calculated, via
rolling checksum or merkle tree, meaning if it's easier for couch to do
merkle they could do that, but something like Pouch could do a rolling
checksum.


On Tue, Apr 15, 2014 at 4:07 PM, Adam Kocoloski <adam.kocoloski@gmail.com>wrote:

> I think it'd make sense to only include the leaf revs.
>
> It's annoying that an update to an old document blows away such a big
> chunk of the Merkel tree (which makes recovery from a mismatch harder), but
> it does have the advantage over the rolling hash of only requiring extra
> space in the inner btree nodes in couch_btree. Not sure how easily one
> could later a Merkle tree on top of other replicating systems (this is
> replication@, after all).
>
> Adam
>
> > On Apr 15, 2014, at 3:57 PM, Chris Anderson <jchris@couchbase.com>
> wrote:
> >
> > I think compaction preserves old rev ids and sequences, but rev-stemming
> > could result in a mismatch unless only the leaf revs are hashed in the
> > merkle reduction.
> >
> >> On Tuesday, April 15, 2014, Calvin Metcalf <calvin.metcalf@gmail.com>
> wrote:
> >>
> >> won't compaction make that tricky to calculate retroactively?
> >>
> >>
> >> On Tue, Apr 15, 2014 at 3:10 PM, Chris Anderson <jchris@couchbase.com
> <javascript:;>
> >>> wrote:
> >>
> >>> If you want to know if checkpoints are the same, maybe a combination of
> >> the
> >>> sequence number and a merkle tree of document revision ids would work?
> It
> >>> would require adding a reduction to the by sequence tree, but you'd be
> >> able
> >>> to know if two sequences also refer to the same content. Eg is the
> source
> >>> database the same as you talked to last, or just a new one with the
> same
> >>> sequence number.
> >>>
> >>> Chris
> >>>
> >>>
> >>> On Tue, Apr 15, 2014 at 11:54 AM, Calvin Metcalf
> >>> <calvin.metcalf@gmail.com>wrote:
> >>>
> >>>> I think the problem is not as much deleting and recreating a database
> >> but
> >>>> wiping a virtual machine and restoring from a backup, now you have
> more
> >>> or
> >>>> less gone back in time with the target database and it has different
> >>> stuff
> >>>> but the same uuid.
> >>>>
> >>>>
> >>>>> On Tue, Apr 15, 2014 at 2:32 PM, Dale Harvey <dale@arandomurl.com>
> >>>> wrote:
> >>>>
> >>>>> I dont understand the problem with per db uuids, so the uuid isnt
> >>>>> multivalued nor is it queried
> >>>>>
> >>>>>   A is readyonly, B is client, B starts replication from A
> >>>>>   B reads the db uuid from A / itself, generates a replication_id,
> >>>> stores
> >>>>> on B
> >>>>>   try to fetch replication checkpoint, if successful we query
> >> changes
> >>>> from
> >>>>> since?
> >>>>>
> >>>>> In pouch we store the uuid along with the data, so file based backups
> >>>> arent
> >>>>> a problem, seems couchdb could / should do that too
> >>>>>
> >>>>> This also fixes the problem mentioned on the mailing list, and one
I
> >>> have
> >>>>> run into personally where people forward db requests but not server
> >>>>> requests via a proxy
> >>>>>
> >>>>>
> >>>>> On 15 April 2014 19:18, Calvin Metcalf <calvin.metcalf@gmail.com>
> >>> wrote:
> >>>>>
> >>>>>> except there is no way to calculate that from outside the database
> >> as
> >>>>>> changes only ever gives the more recent document version.
> >>>>>>
> >>>>>>
> >>>>>> On Sun, Apr 13, 2014 at 9:47 PM, Calvin Metcalf <
> >>>>> calvin.metcalf@gmail.com
> >>>>>>> wrote:
> >>>>>>
> >>>>>>> oo didn't think of that, yeah uuids wouldn't hurt, though
the
> >> more
> >>> I
> >>>>>> think
> >>>>>>> about the rolling hashing on revs, the more I like that
> >>>>>>>
> >>>>>>>
> >>>>>>> On Sun, Apr 13, 2014 at 6:00 PM, Adam Kocoloski <
> >>>>>> adam.kocoloski@gmail.com>wrote:
> >>>>>>>
> >>>>>>>> Yes, but then sysadmins have to be very very careful
about
> >>> restoring
> >>>>>> from
> >>>>>>>> a file-based backup. We run the risk that {uuid, seq}
could be
> >>>>>>>> multi-valued, which diminishes its value considerably.
> >>>>>>>>
> >>>>>>>> I like the UUID in general -- we've added them to our
internal
> >>> shard
> >>>>>>>> files at Cloudant -- but on their own they're not a
bulletproof
> >>>>> solution
> >>>>>>>> for read-only incremental replications.
> >>>>>>>>
> >>>>>>>> Adam
> >>>>>>>>
> >>>>>>>>> On Apr 13, 2014, at 5:16 PM, Calvin Metcalf <
> >>>>> calvin.metcalf@gmail.com
> >>>>>>>
> >>>>>>>> wrote:
> >>>>>>>>>
> >>>>>>>>> I mean if your going to add new features to couch
you could
> >> just
> >>>>> have
> >>>>>>>> the
> >>>>>>>>> db generate a random uuid on creation that would
be different
> >> if
> >>>> it
> >>>>>> was
> >>>>>>>>> deleted and recreated
> >> --
> >> -Calvin W. Metcalf
> >
> >
> > --
> > —
> > Chris Anderson  @jchris
> > http://www.couchbase.com
>



-- 
-Calvin W. Metcalf

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message