Return-Path: Delivered-To: apmail-couchdb-user-archive@www.apache.org Received: (qmail 82738 invoked from network); 10 Dec 2009 17:34:44 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 10 Dec 2009 17:34:44 -0000 Received: (qmail 19089 invoked by uid 500); 10 Dec 2009 17:34:42 -0000 Delivered-To: apmail-couchdb-user-archive@couchdb.apache.org Received: (qmail 19015 invoked by uid 500); 10 Dec 2009 17:34:42 -0000 Mailing-List: contact user-help@couchdb.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@couchdb.apache.org Delivered-To: mailing list user@couchdb.apache.org Received: (qmail 18948 invoked by uid 99); 10 Dec 2009 17:34:42 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 10 Dec 2009 17:34:42 +0000 X-ASF-Spam-Status: No, hits=-2.6 required=5.0 tests=AWL,BAYES_00 X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of jchris@gmail.com designates 209.85.160.56 as permitted sender) Received: from [209.85.160.56] (HELO mail-pw0-f56.google.com) (209.85.160.56) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 10 Dec 2009 17:34:38 +0000 Received: by pwi19 with SMTP id 19so44862pwi.35 for ; Thu, 10 Dec 2009 09:34:18 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:sender:received:in-reply-to :references:date:x-google-sender-auth:message-id:subject:from:to :content-type; bh=yNMTHv5rq1zw1wezZ+/E30OL551opv6CrNX9DDly4rA=; b=jqqM6Kzb9+8LUwju0GfebcHOw1x128h373yTovSN2gQEky1GbpmcIsXVKk0y2Ekp7S uclX34r4Ih66DnrnVEkOASGP6tMcPJ3MjNOUdzCQ55kFuPlyibfHBXKlwWKqRwWKKOHR meF6GdF6IEbANXJWKFm18JktqAw7Rwruzn03U= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:sender:in-reply-to:references:date :x-google-sender-auth:message-id:subject:from:to:content-type; b=uIhrnQCBq5zu3JjMBu/XSm6/kndyFigF22CPIBiD5emcaVZGXFAKg3t0idDLQPzs1Z iG+x9FlM/HO32WUvS9989go9uPebFAUL+wjVMXC8J3uQJKXoJt0Jq77WDwj7s0qzqg9A 9BTisTn0ZhearzFrxl/5xvdh8euPu4rrTHGH4= MIME-Version: 1.0 Sender: jchris@gmail.com Received: by 10.143.25.19 with SMTP id c19mr131558wfj.87.1260466458306; Thu, 10 Dec 2009 09:34:18 -0800 (PST) In-Reply-To: <20091209093128.GA7729@uk.tiscali.com> References: <20091209093128.GA7729@uk.tiscali.com> Date: Thu, 10 Dec 2009 09:34:18 -0800 X-Google-Sender-Auth: 088df3384bd6ae16 Message-ID: Subject: Re: Reducible checksum? From: Chris Anderson To: user@couchdb.apache.org, Chris Anderson Content-Type: text/plain; charset=ISO-8859-1 On Wed, Dec 9, 2009 at 1:31 AM, Brian Candler wrote: > On Mon, Dec 07, 2009 at 02:27:18PM -0800, Chris Anderson wrote: >> If there's a generic way to do this, and it >> is cheap enough, it could be generalized to handle view etags. Your >> row count + max timestamp trick seems sensible to me, but obviously is >> not generalizable. >> >> Presumably you could avoid hashing the keys and values by leaning on >> the document._rev. However, that just pushes the problem back a step. > > Aha. These days the _rev includes a cryptographic checksum of the document > contents, correct? > > In that case I think all we need is a simple sum, modulo 2^128, of the _rev > hashes. This is commutative and associative, and very cheap. (An XOR has the > problem that if the same document is emitted an even number of times, it > vanishes). > > Now, we know that the set of (k,v) pair(s) emitted for any particular doc > can't change unless you modify the design doc. So we could simply add this > sum to a hash of the design doc as a final step, to get the etag. You'd also > have to include the view params in the final hash (e.g. startkey, endkey, > skip) because it's possible that the same set of docs could appear under > different parts of the key space, emitting different keys and/or values, and > you'd want different etags for those. > > This leaves one question: I know the view index already contains > {{key,_id},value}, but does it include _rev? If not, would the extra storage > overhead be acceptable? There have been request for _rev in the view rows in the past but we've always responded with the recommendation to just emit the rev as part of your value. If we were to use a solution like this for the view etags we'd want to hard-code it, but for now you should be able to prototype your approach with the current API. > > The alternative I can see is to take a hash of each {{key,_id},value} and to > sum those hashes. The trouble is this is more expensive computation-wise, > especially if the value is large (e.g. when you emit the whole document). > > Cheers, > > Brian. > -- Chris Anderson http://jchrisa.net http://couch.io