couchdb-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Adam Kocoloski <>
Subject Re: Unique instance IDs?
Date Tue, 13 Dec 2011 03:19:42 GMT
On Dec 12, 2011, at 10:10 PM, Jason Smith wrote:

> On Tue, Dec 13, 2011 at 8:40 AM, Paul Davis <> wrote:
>>> If there were a hypothetical single query which let the receiver
>>> assess its exact relationship to an arbitrary sender's data, I don't
>>> think "starts over" would sound as awful.
>> I agree whole heartedly. And the easiest way I see to making that
>> happen is to decouple the host and db identities in such a way that
>> this is a reality. Its possible there's something elegant we could
>> pull from things like merkle trees. I've spent time considering it and
>> haven't thought of anything but I'd be tickled pink if there were a
>> reasonable solution there.
> Yeah. That is why I keep thinking of a checksum that works well with
> incremental map/reduce. I always recall that CRC32 is a commutative,
> associative checksum algorithm. It could hypothetically give you a
> checksum of the entire tree, and all subtrees down to the leaves, as a
> Couch reduce function. So the idea is to reduce the by_seq index. You
> get checksums of the database or subsets free or cheap.
> At this point I am out of my expertise though so I defer.
> -- 
> Iris Couch

Yep, that's a Merkle tree, and brings us back to where this thread sat 24 hours ago.  Couple
of points:

* You want to stuff the checksums in the id_tree, not the seq_tree.  If you use the seq_tree
you'll never be able to apply updates that get the checksums aligned.

* Merkle trees are great for two-way synchronization, but it's not immediately clear to me
how you'd use them to bootstrap a single source -> target replication.  I might just be
missing a straightforward extension of the tech here.

View raw message