incubator-couchdb-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Randall Leeds <>
Subject Re: interested in learning about replication algorithm
Date Wed, 16 Feb 2011 19:24:33 GMT
The algorithm at a high level goes like this:

Get some changes
Check them for revisions that are missing from the target
Pull/Push the missing revisions

These functions map to the public API in an obvious way:
/_all_docs or /_bulk_docs

For any production-ready replicator you'll probably want to checkpoint
your progress.
CouchDB stores checkpoints in documents prefixed with "_local/".
Docs named this way don't replicate, don't show up in views, etc.
Good for internal metadata stuff like this.

Stable checkpointing requires that, up to a certain sequence, all
updates must be flushed to disk on both sides.
Currently this is accomplished with a separate POST to /_ensure_full_commit.
Couch also honors a header on document update requests called

Almost all the replicator code is contained in couch_rep* or couch_replicator*.
The latter is the new replicator code by Filipe, which may have some
dependency on the old code (I'm not sure).

That should be enough to get you started.


On Tue, Feb 15, 2011 at 18:51, Aaron Boxer <> wrote:
> Thanks, guys! I guess I need to dig into the actual code.
> I would like to implement a similar algorithm in C, for another project
> I am working on.
> On Tue, Feb 15, 2011 at 5:48 PM, Robert Newson <> wrote:
>> It's worth mentioning that, like git, the hash also includes the
>> previous contents (and, hence, is dependent on all previous updates),
>> Only identical sequences of updates will yield the same _rev.
>> B.
>> On 15 February 2011 22:37, Randall Leeds <> wrote:
>>> On Tue, Feb 15, 2011 at 07:30, Aaron Boxer <> wrote:
>>>> Interesting. Thanks!
>>>> How do version ids get generated?  How do the different nodes
>>>> avoid version id collision; i.e. two nodes updating a document with the
>>>> same version id?
>>> The revision id contains both a monotonically increasing number
>>> revision number and a hash of the document contents. The hash breaks
>>> ties (storing the conflict, not resolving it, but deterministically
>>> choosing a privileged version to report as the "newest").
>>> In this manner, should two nodes perform the same update the revision
>>> is said to exist in both places already and replication will note this
>>> and not copy the document again.
>>> -Randall

View raw message