Mailing-List: contact dev-help@couchdb.apache.org; run by ezmlm
Precedence: bulk
Reply-To: dev@couchdb.apache.org
Received-SPF: pass (nike.apache.org: domain of adam.kocoloski@gmail.com
 designates 209.85.221.194 as permitted sender)
Content-Type: text/plain; charset=us-ascii
Mime-Version: 1.0 (Apple Message framework v1077)
Subject: Re: associating UUIDs to DBs
From: Adam Kocoloski <kocolosk@apache.org>
In-Reply-To: <e2111bbb1002040744y6eb3ce38q1bc9b3dfaae870f9@mail.gmail.com>
Date: Thu, 4 Feb 2010 11:17:04 -0500
Content-Transfer-Encoding: quoted-printable
Message-Id: <2EBB675B-A3C2-494A-B3FA-19D089D38268@apache.org>
References: <cadf1c691002011058r1cf1fbfbk45a41db9ea3932f7@mail.gmail.com>
  <ec7a93a1002012355s7d6c06fbwca6e99718750243d@mail.gmail.com>
  <ec7a93a1002020007t4fe00734g6b12faabf570e716@mail.gmail.com>
  <ec7a93a1002021125j11a87224h4b98ac64066cc7d9@mail.gmail.com>
  <e282921e1002021139u660e81e2n77ee23b897aaad90@mail.gmail.com>
  <ec7a93a1002021148u20751c9chd1aed06e95c05257@mail.gmail.com>
  <C96C3CAC-77BB-4CCA-A3C2-BD734419B176@apache.org>
 <46aeb24f1002021341h3a3e6a62l9ab92274646f2c74@mail.gmail.com>
  <20100203095327.GA8099@uk.tiscali.com>
 <7CBFD4B9-23DB-4626-9FC6-81095E1A4161@apache.org>
 <e2111bbb1002040744y6eb3ce38q1bc9b3dfaae870f9@mail.gmail.com>
To: dev@couchdb.apache.org

On Feb 4, 2010, at 10:44 AM, Paul Davis wrote:

> On Thu, Feb 4, 2010 at 10:19 AM, Adam Kocoloski <kocolosk@apache.org> =
wrote:
>> On Feb 3, 2010, at 4:53 AM, Brian Candler wrote:
>>=20
>>> On Tue, Feb 02, 2010 at 09:41:28PM +0000, Robert Newson wrote:
>>>> If couchdb tracked replication by a Merkle tree, it would obsolete =
the
>>>> update_seq mechanism?
>>>=20
>>> Only if you weren't doing filtered/selective replication. And =
probably only
>>> if there was nothing else different between the two databases (e.g. =
_local
>>> docs, _design docs, reader acls etc)
>>=20
>> Correct, Merkle trees are only useful if you expect the two databases =
to be completely identical.  But Bob's right, I'm essentially proposing =
that our by_seq btree is extended into a full Merkle tree for this =
particular use-case.
>>=20
>> Adam
>=20
> Most intriguing. Could you expand on that a bit?
>=20
> Paul

Hi Paul,

The more I think about it using by_seq may not be the optimal choice =
here.  Consider the case where I snapshot my .couch file over to a new =
server, and in the meantime I update the document that was occupying =
update_seq 1 on the original.  The analysis I proposed above would =
conclude that the replication needs to start from the beginning, which =
is true, but overlooks the fact that only one document has changed.

An alternative would be to do the Merkle stuff in the by_id tree, and =
instead of identifying the last update_seq where two DBs are identical, =
identify the set of documents that differ between the two DBs.  =
Replicate just those documents using Filipe's new patch, then record a =
checkpoint at the source's latest update_seq.  You're now fully caught =
up in case you're planning any future _changes-based incremental =
replications.

If we went ahead and implemented this I think the UUID becomes =
superfluous from the replicator's perspective.  You wouldn't want to =
restrict this Merkle tree check to UUID-matched DBs, as it would be =
useful for reducing entropy in a sharded database cluster that stores =
multiple copies of each document in different database shards.  In fact, =
IIRC that was a Dynamo feature in the original Amazon paper.

Adam