Return-Path: Delivered-To: apmail-couchdb-user-archive@www.apache.org Received: (qmail 63974 invoked from network); 16 Feb 2011 19:25:07 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 16 Feb 2011 19:25:07 -0000 Received: (qmail 56330 invoked by uid 500); 16 Feb 2011 19:25:06 -0000 Delivered-To: apmail-couchdb-user-archive@couchdb.apache.org Received: (qmail 56126 invoked by uid 500); 16 Feb 2011 19:25:03 -0000 Mailing-List: contact user-help@couchdb.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@couchdb.apache.org Delivered-To: mailing list user@couchdb.apache.org Received: (qmail 56110 invoked by uid 99); 16 Feb 2011 19:25:02 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 16 Feb 2011 19:25:02 +0000 X-ASF-Spam-Status: No, hits=2.9 required=5.0 tests=FREEMAIL_FROM,FS_REPLICA,RCVD_IN_DNSWL_LOW,SPF_PASS,T_TO_NO_BRKTS_FREEMAIL X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of randall.leeds@gmail.com designates 209.85.161.52 as permitted sender) Received: from [209.85.161.52] (HELO mail-fx0-f52.google.com) (209.85.161.52) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 16 Feb 2011 19:24:54 +0000 Received: by fxm5 with SMTP id 5so1736068fxm.11 for ; Wed, 16 Feb 2011 11:24:34 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:in-reply-to:references:date :message-id:subject:from:to:cc:content-type :content-transfer-encoding; bh=WzIQUBs1X41AaR1CF4Anp6PQdEs1rnmMf7rylKcr22I=; b=fWXc7HuE0KiH13Vfi7HJEEwgTduDI8e2l4rvhj3ckbJ1y+7drrCOYuAEwr3OsxZQaM 77qMeYa/MPVEBFSZCVxjBBV+4S2oDqsn5eX0lRzcQSmAIhd/OYjtf33FDqAxvLb4KvZD 1cTYy7WAVbOQLWXr3/0gxRZOYrJkdIhNR8ncc= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type:content-transfer-encoding; b=JgRxKgVR8Br+fNfGmZVBQ8vVOupD3j1dgexZwu0d+C5QdVkIQqN1ExbW8ndTZaSfoI SrVDoZ+Nlj5z35FBzJe15XgzPRnhFzL85vt/Bo5PHadS0W+ZCyX80H//yDJU757OCJ9X 1R4EzSc2h3An4aAt4GfVcgnWoq83FuwDY684g= MIME-Version: 1.0 Received: by 10.223.78.138 with SMTP id l10mr91626fak.17.1297884274009; Wed, 16 Feb 2011 11:24:34 -0800 (PST) Received: by 10.223.87.77 with HTTP; Wed, 16 Feb 2011 11:24:33 -0800 (PST) In-Reply-To: References: Date: Wed, 16 Feb 2011 11:24:33 -0800 Message-ID: Subject: Re: interested in learning about replication algorithm From: Randall Leeds To: user@couchdb.apache.org Cc: Aaron Boxer Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable X-Virus-Checked: Checked by ClamAV on apache.org The algorithm at a high level goes like this: Get some changes Check them for revisions that are missing from the target Pull/Push the missing revisions Repeat These functions map to the public API in an obvious way: /_changes /_missing_revs /_all_docs or /_bulk_docs For any production-ready replicator you'll probably want to checkpoint your progress. CouchDB stores checkpoints in documents prefixed with "_local/". Docs named this way don't replicate, don't show up in views, etc. Good for internal metadata stuff like this. Stable checkpointing requires that, up to a certain sequence, all updates must be flushed to disk on both sides. Currently this is accomplished with a separate POST to /_ensure_full_commit= . Couch also honors a header on document update requests called X-Couch-Full-Commit. Almost all the replicator code is contained in couch_rep* or couch_replicat= or*. The latter is the new replicator code by Filipe, which may have some dependency on the old code (I'm not sure). That should be enough to get you started. -Randall On Tue, Feb 15, 2011 at 18:51, Aaron Boxer wrote: > Thanks, guys! I guess I need to dig into the actual code. > > I would like to implement a similar algorithm in C, for another project > I am working on. > > > > On Tue, Feb 15, 2011 at 5:48 PM, Robert Newson = wrote: >> It's worth mentioning that, like git, the hash also includes the >> previous contents (and, hence, is dependent on all previous updates), >> >> Only identical sequences of updates will yield the same _rev. >> >> B. >> >> On 15 February 2011 22:37, Randall Leeds wrote= : >>> On Tue, Feb 15, 2011 at 07:30, Aaron Boxer wrote: >>>> Interesting. Thanks! >>>> >>>> How do version ids get generated? =C2=A0How do the different nodes >>>> avoid version id collision; i.e. two nodes updating a document with th= e >>>> same version id? >>> >>> The revision id contains both a monotonically increasing number >>> revision number and a hash of the document contents. The hash breaks >>> ties (storing the conflict, not resolving it, but deterministically >>> choosing a privileged version to report as the "newest"). >>> >>> In this manner, should two nodes perform the same update the revision >>> is said to exist in both places already and replication will note this >>> and not copy the document again. >>> >>> -Randall >>> >> >