couchdb-replication mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jens Alfke <j...@couchbase.com>
Subject Re: rev hash stability
Date Fri, 17 Oct 2014 23:17:32 GMT

> On Oct 17, 2014, at 2:22 PM, Brian Mitchell <brian@standardanalytics.io> wrote:
> 
> Giving revs meaning outside of this scope is likely to bring up more meta
> discussion about the CouchDB data model and a long history of
> undocumented choices which only manifest in the particular
> implementation we have today.

That does appear to be a danger. I'm not interested in bike-shedding; if the Apache CouchDB
community can't make progress on this issue then we can discuss it elsewhere to come up with
solutions. I can't speak for Chris, but I'm here as a courtesy and because I believe interoperability
is important. But I believe making progress is more important.

Back to the matter at hand: experience from a long line of P2P systems (from FreeNet onwards)
shows the value of giving pieces of distributed content their own unique and unforgeable IDs.
CouchDB-style revision IDs partly meet this need, except that:
(a) there are interoperability issues because every implementation has its own algorithm for
generating the IDs;
(b) none of the current ones are very unforgeable because they use the broken MD5 hash instead
of something like SHA256;
(c) the unforgeability isn't verified because the replicator doesn't check that a revision's
ID matches its contents.

At some point — Couchbase would like to build P2P systems in the future — we may need
to take this more seriously, at which point it becomes necessary to have a canonical rev-ID
generation algorithm which is enforced by the replicator. That algorithm will need to be standardized
for interoperability purposes, since otherwise two implementations would reject each other's
revisions as forgeries.

That's why I see this issue as important.

—Jens
Mime
View raw message