couchdb-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Chris Anderson <jch...@apache.org>
Subject Re: Reducible checksum?
Date Mon, 07 Dec 2009 22:27:18 GMT
On Mon, Dec 7, 2009 at 2:34 AM, Brian Candler <B.Candler@pobox.com> wrote:
> I am thinking about storing some derived data which is associated with key
> ranges of a view. (Example: an image which provides a graphical summary of a
> key range).
>
> I would like to determine when it's time to regenerate an image, that is,
> when the underlying view has changed within that range.
>
> One thought I had was if I could make a reduce function which was some sort
> of checksum of the key/value pairs. Then I could just do a reduce query
> across the key range, and see if the reduce value has changed. It would be
> like an etag for the range.
>
> Unfortunately, I can't just do something simple like an md5sum across the
> range, because couchdb implements a tree of reduces and re-reduces, and may
> decide to restructure this tree. I'd like a checksum which is invariant
> across all possible reduce trees for the same data.
>
> Something simple would be to XOR all the keys and values together, but
> sometimes this would not detect changes which happen to XOR to the same
> data.
>
> Perhaps I should md5 each (key,value) pair, and then XOR all those together
> in the reduce function.
>
> Since my docs have updated timestamps, maybe I should just take the max() of
> the updated timestamp for each doc, together with a count of the docs (so as
> to be able to detect deletions)

This is a great question. If there's a generic way to do this, and it
is cheap enough, it could be generalized to handle view etags. Your
row count + max timestamp trick seems sensible to me, but obviously is
not generalizable.

Presumably you could avoid hashing the keys and values by leaning on
the document._rev. However, that just pushes the problem back a step.

What we need for a general solution is a commutative and associated
checksum function, which would be a funny beast indeed.

Chris

>
> I just wondered if anyone had already made an elegant solution for this? Or
> some completely different way of determining whether a view has changed
> between a given startkey and endkey?
>
> Thanks,
>
> Brian.
>



-- 
Chris Anderson
http://jchrisa.net
http://couch.io

Mime
View raw message