Return-Path: Delivered-To: apmail-couchdb-user-archive@www.apache.org Received: (qmail 40110 invoked from network); 7 Dec 2009 10:35:28 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 7 Dec 2009 10:35:28 -0000 Received: (qmail 38562 invoked by uid 500); 7 Dec 2009 10:35:26 -0000 Delivered-To: apmail-couchdb-user-archive@couchdb.apache.org Received: (qmail 38296 invoked by uid 500); 7 Dec 2009 10:35:25 -0000 Mailing-List: contact user-help@couchdb.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@couchdb.apache.org Delivered-To: mailing list user@couchdb.apache.org Received: (qmail 38281 invoked by uid 99); 7 Dec 2009 10:35:24 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 07 Dec 2009 10:35:24 +0000 X-ASF-Spam-Status: No, hits=-2.5 required=5.0 tests=AWL,BAYES_00 X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of b.candler@pobox.com designates 208.72.237.25 as permitted sender) Received: from [208.72.237.25] (HELO sasl.smtp.pobox.com) (208.72.237.25) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 07 Dec 2009 10:35:22 +0000 Received: from sasl.smtp.pobox.com (unknown [127.0.0.1]) by a-pb-sasl-quonix.pobox.com (Postfix) with ESMTP id D0263859C5 for ; Mon, 7 Dec 2009 05:34:57 -0500 (EST) DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=pobox.com; h=date:from:to :subject:message-id:mime-version:content-type; s=sasl; bh=r7V7DQ Mi1cTITDTofKpfZVEP6rE=; b=W1r27UQd+oR2UQ58TnA3UVcWJTfLjtaGQGdUoV 2kZW1a4eBH7MrZF4bMI717SFvQ74ciEqIAawh5ZD6Ay1tvlbF9dtf/tizY12PA6N RwYvei45k26Jc1mc6ptlfeZCRzqFt2W5lqB7/ed8VeqZZWNHSK8bw/r52RomsXeA 1kbh8= DomainKey-Signature: a=rsa-sha1; c=nofws; d=pobox.com; h=date:from:to :subject:message-id:mime-version:content-type; q=dns; s=sasl; b= djlQTuKCsfhOAAga3A61QccFk5tNlkyaNyzjGcKpIhYP/yXk9N+gKvp16zip93b7 CryA0K/95ii+lclQDO6jdVRuqxks8P82Jo8863KAZMd0fRSx/tGGAhx7lYgxwSBe 7bElo+23fku1i3btpQJpvwQ1aFyCxGgjxZzYmQCSzIw= Received: from a-pb-sasl-quonix. (unknown [127.0.0.1]) by a-pb-sasl-quonix.pobox.com (Postfix) with ESMTP id E4454859BF for ; Mon, 7 Dec 2009 05:34:55 -0500 (EST) Received: from mappit (unknown [80.45.95.114]) (using TLSv1 with cipher AES256-SHA (256/256 bits)) (No client certificate requested) by a-pb-sasl-quonix.pobox.com (Postfix) with ESMTPSA id DAF45859BC for ; Mon, 7 Dec 2009 05:34:54 -0500 (EST) Received: from brian by mappit with local (Exim 4.69) (envelope-from ) id 1NHavd-0001Oh-H0 for user@couchdb.apache.org; Mon, 07 Dec 2009 10:34:53 +0000 Date: Mon, 7 Dec 2009 10:34:53 +0000 From: Brian Candler To: user@couchdb.apache.org Subject: Reducible checksum? Message-ID: <20091207103453.GB5146@uk.tiscali.com> Mail-Followup-To: user@couchdb.apache.org MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.5.17+20080114 (2008-01-14) X-Pobox-Relay-ID: 29499FCC-E31C-11DE-B08B-9F3FEE7EF46B-28021239!a-pb-sasl-quonix.pobox.com I am thinking about storing some derived data which is associated with key ranges of a view. (Example: an image which provides a graphical summary of a key range). I would like to determine when it's time to regenerate an image, that is, when the underlying view has changed within that range. One thought I had was if I could make a reduce function which was some sort of checksum of the key/value pairs. Then I could just do a reduce query across the key range, and see if the reduce value has changed. It would be like an etag for the range. Unfortunately, I can't just do something simple like an md5sum across the range, because couchdb implements a tree of reduces and re-reduces, and may decide to restructure this tree. I'd like a checksum which is invariant across all possible reduce trees for the same data. Something simple would be to XOR all the keys and values together, but sometimes this would not detect changes which happen to XOR to the same data. Perhaps I should md5 each (key,value) pair, and then XOR all those together in the reduce function. Since my docs have updated timestamps, maybe I should just take the max() of the updated timestamp for each doc, together with a count of the docs (so as to be able to detect deletions) I just wondered if anyone had already made an elegant solution for this? Or some completely different way of determining whether a view has changed between a given startkey and endkey? Thanks, Brian.