couchdb-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Adam Kocoloski <kocol...@apache.org>
Subject Re: documentation of replication protocol?
Date Fri, 23 Apr 2010 14:58:05 GMT
On Apr 21, 2010, at 8:33 AM, Miles Fidelman wrote:

> J Chris Anderson wrote:
>> On Apr 20, 2010, at 7:29 PM, Miles Fidelman wrote:
>>   
>>> I've been looking, but can't seem to find any good documentation of the inter-node
protocol used for replication.
>>>     
>> As far as I know, the best source for documentation is the code, right now.
>>   
> <snip>
>> This is the hard coding (in Ruby) I had to add, to used the CouchDB replicator to
pull from the Booth server:
>> 
>> http://github.com/jchris/booth/commit/2deff74e03838a6e7ef95b725c4342a08239a2b8#commitcomment-68685
>> 
>>   
> aaarrrgggh......
> 
> I don't suppose anyone out there has scribbled down anything resembling a sequence diagram
or flowchart or list of bullet points or something that summarizes the steps that happen,
and the code that gets run when POST /_replicate is invoked, or an ASN.1-like summary of the
messages that get exchanged between two couch instances during replication
> 
> right now, replication reminds me of the old Sidney Harris Cartoon, "then a miracle occurs"
(http://www.sciencecartoonsplus.com/pages/gallery.php)
> 
> -- 
> In theory, there is no difference between theory and practice.
> In<fnord>  practice, there is.   .... Yogi Berra

Hi Miles,

Simon Metson reminded me that I wrote down something like this for him a few months back.
 Here it is.  It describes the replication workflow using inline document attachments, rather
than the more efficient multipart requests which are supported in 0.11.  Hope it helps.  Regards,

Adam

On 8 Dec 2009, at 01:42, Adam Kocoloski wrote:

> So, the sequence of calls depends on whether you're pulling updates from this remote
server or pushing updates to it.  Let's consider the two cases separately:
> 
> ## Pull Replication (remote source, local target)
> 
> ### HEAD /db
> Respond with a 200 status code and you're good.
> 
> ### GET /db/_local/<rep id>
> The replicator checkpoints its progress in these _local documents.  You can respond with
a 404 if you like, otherwise the response should be JSON that looks very much like a replication
response, e.g. the one described here:
> 
> http://books.couchdb.org/relax/reference/replication#Replication%20in%20Detail
> 
> Basically, if the _local doc exists and both the source and target DBs, and the documents
agree on the value of "source_last_seq", the replicator will start from the update sequence
on the source.
> 
> ### GET /db/_changes?style=all_docs&heartbeat=10000&since=N[&feed=continuous]
> 
> This is the hard part.  The replicator makes this request on a separate connection to
your server, asking for a list of changes since N (the source_last_seq from the previous step).
 If the replication is meant to be permanent, the feed=continuous parameter will be supplied.
 The best reference for the response format is definitely the O'Reilly book:
> 
> http://books.couchdb.org/relax/reference/change-notifications
> 
> ### GET /db/docid?revs=true&latest=true&open_revs["1-23420432",...]
> 
> You'll see one of these for each updated document if the update does not already exist
on the target. I believe the response is a JSON Array
> 
> [{"ok":{"_id":"docid","_rev":"1-23420432", ..rest of doc}, {"missing":"some-bad-rev"}]
> 
> The "missing" case is very rare and is usually the result of somebody racing the replicator.
> 
> ### GET /db/docid/attachment?rev=1-234923042
> 
> Attachments are downloaded separately during pull replication.  The correct response
is the binary data.
> 
> ### PUT /db/_local/<rep id>
> 
> Periodically the replicator will try to save an updated _local doc with the new replication
history. The response is {"ok":true, "rev":NewRevId}
> 
> That's it for pull replication.
> 
> ## Push replication (local source, remote target)
> 
> The _local doc calls are still there, but now we have two new POSTs:
> 
> POST /db/_missing_revs -d '{"docid1":["1-24323423"], "docid2":"["2-23434534"]}
> 
> This is the replicator asking the target if these document revisions are already saved
there.  The response is a list of the ones that are missing:
> 
> {"missing_revs":{"docid2":["2-23434534"]}}
> 
> POST /db/_bulk_docs -d '{"new_edits":false, "docs":[... array of documents ...]}
> 
> This one is exactly like the regular _bulk_docs call.  The new_edits:false parameter
tells the target not to throw conflict, but instead save all these updates, as conflict revisions
if necessary. Currently attachments are inlined, although in 0.11 we'll be doing special multipart
PUTs for documents with attachments instead of using _bulk_docs (so we don't need to Base64
encode them). Best,
> 
> Adam


Mime
View raw message