incubator-couchdb-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Randall Leeds <randall.le...@gmail.com>
Subject Re: interested in learning about replication algorithm
Date Wed, 16 Feb 2011 19:24:33 GMT
The algorithm at a high level goes like this:

Get some changes
Check them for revisions that are missing from the target
Pull/Push the missing revisions
Repeat

These functions map to the public API in an obvious way:
/_changes
/_missing_revs
/_all_docs or /_bulk_docs

For any production-ready replicator you'll probably want to checkpoint
your progress.
CouchDB stores checkpoints in documents prefixed with "_local/".
Docs named this way don't replicate, don't show up in views, etc.
Good for internal metadata stuff like this.

Stable checkpointing requires that, up to a certain sequence, all
updates must be flushed to disk on both sides.
Currently this is accomplished with a separate POST to /_ensure_full_commit.
Couch also honors a header on document update requests called
X-Couch-Full-Commit.

Almost all the replicator code is contained in couch_rep* or couch_replicator*.
The latter is the new replicator code by Filipe, which may have some
dependency on the old code (I'm not sure).

That should be enough to get you started.

-Randall

On Tue, Feb 15, 2011 at 18:51, Aaron Boxer <boxerab@gmail.com> wrote:
> Thanks, guys! I guess I need to dig into the actual code.
>
> I would like to implement a similar algorithm in C, for another project
> I am working on.
>
>
>
> On Tue, Feb 15, 2011 at 5:48 PM, Robert Newson <robert.newson@gmail.com> wrote:
>> It's worth mentioning that, like git, the hash also includes the
>> previous contents (and, hence, is dependent on all previous updates),
>>
>> Only identical sequences of updates will yield the same _rev.
>>
>> B.
>>
>> On 15 February 2011 22:37, Randall Leeds <randall.leeds@gmail.com> wrote:
>>> On Tue, Feb 15, 2011 at 07:30, Aaron Boxer <boxerab@gmail.com> wrote:
>>>> Interesting. Thanks!
>>>>
>>>> How do version ids get generated?  How do the different nodes
>>>> avoid version id collision; i.e. two nodes updating a document with the
>>>> same version id?
>>>
>>> The revision id contains both a monotonically increasing number
>>> revision number and a hash of the document contents. The hash breaks
>>> ties (storing the conflict, not resolving it, but deterministically
>>> choosing a privileged version to report as the "newest").
>>>
>>> In this manner, should two nodes perform the same update the revision
>>> is said to exist in both places already and replication will note this
>>> and not copy the document again.
>>>
>>> -Randall
>>>
>>
>

Mime
View raw message