incubator-couchdb-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Brian Candler <>
Subject Reducible checksum?
Date Mon, 07 Dec 2009 10:34:53 GMT
I am thinking about storing some derived data which is associated with key
ranges of a view. (Example: an image which provides a graphical summary of a
key range).

I would like to determine when it's time to regenerate an image, that is,
when the underlying view has changed within that range.

One thought I had was if I could make a reduce function which was some sort
of checksum of the key/value pairs. Then I could just do a reduce query
across the key range, and see if the reduce value has changed. It would be
like an etag for the range.

Unfortunately, I can't just do something simple like an md5sum across the
range, because couchdb implements a tree of reduces and re-reduces, and may
decide to restructure this tree. I'd like a checksum which is invariant
across all possible reduce trees for the same data.

Something simple would be to XOR all the keys and values together, but
sometimes this would not detect changes which happen to XOR to the same

Perhaps I should md5 each (key,value) pair, and then XOR all those together
in the reduce function.

Since my docs have updated timestamps, maybe I should just take the max() of
the updated timestamp for each doc, together with a count of the docs (so as
to be able to detect deletions)

I just wondered if anyone had already made an elegant solution for this? Or
some completely different way of determining whether a view has changed
between a given startkey and endkey?



View raw message