Return-Path: Delivered-To: apmail-couchdb-dev-archive@www.apache.org Received: (qmail 76524 invoked from network); 4 Feb 2010 16:17:43 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 4 Feb 2010 16:17:43 -0000 Received: (qmail 3820 invoked by uid 500); 4 Feb 2010 16:17:43 -0000 Delivered-To: apmail-couchdb-dev-archive@couchdb.apache.org Received: (qmail 3742 invoked by uid 500); 4 Feb 2010 16:17:42 -0000 Mailing-List: contact dev-help@couchdb.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@couchdb.apache.org Delivered-To: mailing list dev@couchdb.apache.org Received: (qmail 3732 invoked by uid 99); 4 Feb 2010 16:17:42 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 04 Feb 2010 16:17:42 +0000 X-ASF-Spam-Status: No, hits=-0.0 required=10.0 tests=SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of adam.kocoloski@gmail.com designates 209.85.221.194 as permitted sender) Received: from [209.85.221.194] (HELO mail-qy0-f194.google.com) (209.85.221.194) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 04 Feb 2010 16:17:33 +0000 Received: by qyk32 with SMTP id 32so1174722qyk.12 for ; Thu, 04 Feb 2010 08:17:12 -0800 (PST) Received: by 10.224.64.160 with SMTP id e32mr208321qai.387.1265300228954; Thu, 04 Feb 2010 08:17:08 -0800 (PST) Received: from ?10.0.1.9? (c-71-232-49-44.hsd1.ma.comcast.net [71.232.49.44]) by mx.google.com with ESMTPS id 5sm1003432qwg.48.2010.02.04.08.17.06 (version=TLSv1/SSLv3 cipher=RC4-MD5); Thu, 04 Feb 2010 08:17:07 -0800 (PST) Content-Type: text/plain; charset=us-ascii Mime-Version: 1.0 (Apple Message framework v1077) Subject: Re: associating UUIDs to DBs From: Adam Kocoloski In-Reply-To: Date: Thu, 4 Feb 2010 11:17:04 -0500 Content-Transfer-Encoding: quoted-printable Message-Id: <2EBB675B-A3C2-494A-B3FA-19D089D38268@apache.org> References: <46aeb24f1002021341h3a3e6a62l9ab92274646f2c74@mail.gmail.com> <20100203095327.GA8099@uk.tiscali.com> <7CBFD4B9-23DB-4626-9FC6-81095E1A4161@apache.org> To: dev@couchdb.apache.org X-Mailer: Apple Mail (2.1077) X-Virus-Checked: Checked by ClamAV on apache.org On Feb 4, 2010, at 10:44 AM, Paul Davis wrote: > On Thu, Feb 4, 2010 at 10:19 AM, Adam Kocoloski = wrote: >> On Feb 3, 2010, at 4:53 AM, Brian Candler wrote: >>=20 >>> On Tue, Feb 02, 2010 at 09:41:28PM +0000, Robert Newson wrote: >>>> If couchdb tracked replication by a Merkle tree, it would obsolete = the >>>> update_seq mechanism? >>>=20 >>> Only if you weren't doing filtered/selective replication. And = probably only >>> if there was nothing else different between the two databases (e.g. = _local >>> docs, _design docs, reader acls etc) >>=20 >> Correct, Merkle trees are only useful if you expect the two databases = to be completely identical. But Bob's right, I'm essentially proposing = that our by_seq btree is extended into a full Merkle tree for this = particular use-case. >>=20 >> Adam >=20 > Most intriguing. Could you expand on that a bit? >=20 > Paul Hi Paul, The more I think about it using by_seq may not be the optimal choice = here. Consider the case where I snapshot my .couch file over to a new = server, and in the meantime I update the document that was occupying = update_seq 1 on the original. The analysis I proposed above would = conclude that the replication needs to start from the beginning, which = is true, but overlooks the fact that only one document has changed. An alternative would be to do the Merkle stuff in the by_id tree, and = instead of identifying the last update_seq where two DBs are identical, = identify the set of documents that differ between the two DBs. = Replicate just those documents using Filipe's new patch, then record a = checkpoint at the source's latest update_seq. You're now fully caught = up in case you're planning any future _changes-based incremental = replications. If we went ahead and implemented this I think the UUID becomes = superfluous from the replicator's perspective. You wouldn't want to = restrict this Merkle tree check to UUID-matched DBs, as it would be = useful for reducing entropy in a sharded database cluster that stores = multiple copies of each document in different database shards. In fact, = IIRC that was a Dynamo feature in the original Amazon paper. Adam