couchdb-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Brian Candler <B.Cand...@pobox.com>
Subject Re: Reducible checksum?
Date Wed, 09 Dec 2009 09:31:28 GMT
On Mon, Dec 07, 2009 at 02:27:18PM -0800, Chris Anderson wrote:
> If there's a generic way to do this, and it
> is cheap enough, it could be generalized to handle view etags. Your
> row count + max timestamp trick seems sensible to me, but obviously is
> not generalizable.
> 
> Presumably you could avoid hashing the keys and values by leaning on
> the document._rev. However, that just pushes the problem back a step.

Aha. These days the _rev includes a cryptographic checksum of the document
contents, correct?

In that case I think all we need is a simple sum, modulo 2^128, of the _rev
hashes. This is commutative and associative, and very cheap. (An XOR has the
problem that if the same document is emitted an even number of times, it
vanishes).

Now, we know that the set of (k,v) pair(s) emitted for any particular doc
can't change unless you modify the design doc. So we could simply add this
sum to a hash of the design doc as a final step, to get the etag. You'd also
have to include the view params in the final hash (e.g. startkey, endkey,
skip) because it's possible that the same set of docs could appear under
different parts of the key space, emitting different keys and/or values, and
you'd want different etags for those.

This leaves one question: I know the view index already contains
{{key,_id},value}, but does it include _rev? If not, would the extra storage
overhead be acceptable?

The alternative I can see is to take a hash of each {{key,_id},value} and to
sum those hashes. The trouble is this is more expensive computation-wise,
especially if the value is large (e.g. when you emit the whole document).

Cheers,

Brian.

Mime
View raw message