Return-Path: X-Original-To: apmail-couchdb-dev-archive@www.apache.org Delivered-To: apmail-couchdb-dev-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id A23647F11 for ; Tue, 13 Dec 2011 03:20:17 +0000 (UTC) Received: (qmail 28315 invoked by uid 500); 13 Dec 2011 03:20:16 -0000 Delivered-To: apmail-couchdb-dev-archive@couchdb.apache.org Received: (qmail 28277 invoked by uid 500); 13 Dec 2011 03:20:16 -0000 Mailing-List: contact dev-help@couchdb.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@couchdb.apache.org Delivered-To: mailing list dev@couchdb.apache.org Received: (qmail 28269 invoked by uid 99); 13 Dec 2011 03:20:16 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 13 Dec 2011 03:20:16 +0000 X-ASF-Spam-Status: No, hits=-0.7 required=5.0 tests=RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of adam.kocoloski@gmail.com designates 209.85.216.52 as permitted sender) Received: from [209.85.216.52] (HELO mail-qw0-f52.google.com) (209.85.216.52) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 13 Dec 2011 03:20:08 +0000 Received: by qadc11 with SMTP id c11so4750150qad.11 for ; Mon, 12 Dec 2011 19:19:47 -0800 (PST) Received: by 10.224.189.3 with SMTP id dc3mr582035qab.38.1323746387297; Mon, 12 Dec 2011 19:19:47 -0800 (PST) Received: from [192.168.1.5] (c-76-119-89-178.hsd1.ma.comcast.net. [76.119.89.178]) by mx.google.com with ESMTPS id ev4sm28584441qab.9.2011.12.12.19.19.44 (version=TLSv1/SSLv3 cipher=OTHER); Mon, 12 Dec 2011 19:19:45 -0800 (PST) Content-Type: text/plain; charset=us-ascii Mime-Version: 1.0 (Apple Message framework v1084) Subject: Re: Unique instance IDs? From: Adam Kocoloski In-Reply-To: Date: Mon, 12 Dec 2011 22:19:42 -0500 Content-Transfer-Encoding: quoted-printable Message-Id: <412BF0E6-A976-46BF-A784-3F9AE74A13DC@apache.org> References: To: dev@couchdb.apache.org X-Mailer: Apple Mail (2.1084) On Dec 12, 2011, at 10:10 PM, Jason Smith wrote: > On Tue, Dec 13, 2011 at 8:40 AM, Paul Davis = wrote: >>> If there were a hypothetical single query which let the receiver >>> assess its exact relationship to an arbitrary sender's data, I don't >>> think "starts over" would sound as awful. >>>=20 >>=20 >> I agree whole heartedly. And the easiest way I see to making that >> happen is to decouple the host and db identities in such a way that >> this is a reality. Its possible there's something elegant we could >> pull from things like merkle trees. I've spent time considering it = and >> haven't thought of anything but I'd be tickled pink if there were a >> reasonable solution there. >=20 > Yeah. That is why I keep thinking of a checksum that works well with > incremental map/reduce. I always recall that CRC32 is a commutative, > associative checksum algorithm. It could hypothetically give you a > checksum of the entire tree, and all subtrees down to the leaves, as a > Couch reduce function. So the idea is to reduce the by_seq index. You > get checksums of the database or subsets free or cheap. >=20 > At this point I am out of my expertise though so I defer. >=20 > --=20 > Iris Couch Yep, that's a Merkle tree, and brings us back to where this thread sat = 24 hours ago. Couple of points: * You want to stuff the checksums in the id_tree, not the seq_tree. If = you use the seq_tree you'll never be able to apply updates that get the = checksums aligned. * Merkle trees are great for two-way synchronization, but it's not = immediately clear to me how you'd use them to bootstrap a single source = -> target replication. I might just be missing a straightforward = extension of the tech here. Adam=