couchdb-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Alex Besogonov <>
Subject Re: Unique instance IDs?
Date Wed, 14 Dec 2011 04:08:14 GMT
On Mon, Dec 12, 2011 at 10:26 PM, Paul Davis
<> wrote:
>> * Merkle trees are great for two-way synchronization, but it's not immediately clear
to me how you'd use them to bootstrap a single source -> target replication.  I might
just be missing a straightforward extension of the tech here.
> This is the point that's important with checksums and so on. Merkle
> trees are great when you want to mirror structured data but CouchDB
> replication is a mirror operation. Think, N db's replicating to a
> central DB. you have a mixture of things which breaks checksums (or at
> least any obvious application I can think of given our internal
> structures)
Uhm. What are the things that break checksums? Right now revision IDs
are _almost_
deterministic and it's not that hard to make them completely
deterministic. And for
replication purposes nothing else matters.

To be exact: the only entity used for ID generation is '[Deleted,
OldStart, OldRev, Body, Atts2]'
tuple and only 'Atts2' field can be non-deterministic. And that can be
fixed (with other minor
forward-looking features like explicit versioning).

Then it's easy to devise a protocol to replicate based on hash trees.
I'm thinking about
this protocol:
1) The current state of replicated database is identified by a hash.
Suppose that we
have unidirectional replication A->B.

Let's denote state of the initial database A as A1 and B's as B1.

We store the ancestry as a list of hashes outside database (so it
doesn't influence the
hash of the database).

2) As the first step B sends its list of replication ancestry.

It's actually not even required to send the whole hashes each time,
just send the first
4 bytes of each hash. That way even 1 million records of replication
history would take
only 4Mb. The 'A' server then replies with its own set of hashes with
the matching
initial bytes. If there are none, then the client falls back to the
usual replication.

So at this step 'B' knows the most recent common ancestor and requests
the changes
that have happened since that point of time. Each changeset, naturally, has its
own hash.

3) After these changes are applied and merged, B's state is the A1
state plus all the
B's changes that might have happened ever since. Then B stores the hashes
of the changesets that have been applied.

That's it. Should work, as far as I see (it's 3am local time, so I
might miss something).

Overhead: 16 bytes for the hash information for each changeset.

View raw message