couchdb-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Adam Kocoloski <kocol...@apache.org>
Subject idea for transitive replication checkpoints
Date Fri, 18 Feb 2011 02:45:22 GMT
Hi all, Paul and I were chatting at today's CouchDB Hack Night about a way to fast-forward
replications (thanks Max for the prodding!).  It's non-trivial, but I think the benefit for
big networks of CouchDB servers can be substantial.

The basic idea is that if A replicates with B, and B with C, then a new replication between
A and C should not need to start from scratch.  I think we can accomplish this as follows:

1) Store the target update sequence along with the source sequence in the checkpoint document,
at least in the checkpoint document on the target.  The following tuple is important: {Source,
_local ID, Session ID, SourceSeq, TargetSeq}.  Using that syntax let's say we have the following
replication records:

On A
{A, _local/Foo, Bar, 5, _TargetSeq} % we could omit the target sequence on the source

On B
{A, _local/Foo, Bar, 5, 10} % 5 on A corresponds to 10 on B
{B, _local/Baz, Bif, 15, _TargetSeq}

On C
{B, _local/Baz, Bif, 15, 7} % 15 on B corresponds to 7 on C

We know that A -> B happened before B -> C.

2) During the B -> C replication, when we reach source sequence number 10, the _changes
feed from B will deliver some extra information like

{A, _local/Foo, Bar, 5}

which will be stored at C. This may require a new disk-resident btree keyed on update sequence,
or at least an in-memory index constructed by walking the _local docs btree.

3) When we trigger the A -> C replication, C will walk the full checkpoint records in its
_local tree and find no mention of A, but then it will also consult the "transitive" checkpoints
and find the {A, _local/Foo, Bar, 5} record.  It'll consult _local/Foo on A, find that the
session ID Bar is still present, and conclude that it can fast-forward the replication and
start from update sequence 5.  It will then remove that transitive checkpoint and replace
it with a full regular checkpoint.

If server A crashes after the A -> B replication and restores from a backup that was recorded
before the replication, the session ID Bar will be missing from _local/Foo, so when we try
to do the A -> replication we won't fast forward.  This is the correct behavior.

Hopefully this is comprehensible to someone other than me.  We spent some time trying to poke
holes in it, but it's entirely possible there are other things we didn't consider that will
prevent it from working.  Cheers,

Adam
Mime
View raw message