Return-Path: Delivered-To: apmail-couchdb-user-archive@www.apache.org Received: (qmail 11229 invoked from network); 25 Jun 2009 09:09:11 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 25 Jun 2009 09:09:11 -0000 Received: (qmail 94012 invoked by uid 500); 25 Jun 2009 09:09:21 -0000 Delivered-To: apmail-couchdb-user-archive@couchdb.apache.org Received: (qmail 93956 invoked by uid 500); 25 Jun 2009 09:09:20 -0000 Mailing-List: contact user-help@couchdb.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@couchdb.apache.org Delivered-To: mailing list user@couchdb.apache.org Received: (qmail 93946 invoked by uid 99); 25 Jun 2009 09:09:20 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 25 Jun 2009 09:09:20 +0000 X-ASF-Spam-Status: No, hits=-0.0 required=10.0 tests=SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of b.candler@pobox.com designates 208.72.237.25 as permitted sender) Received: from [208.72.237.25] (HELO sasl.smtp.pobox.com) (208.72.237.25) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 25 Jun 2009 09:09:10 +0000 Received: from localhost.localdomain (unknown [127.0.0.1]) by a-sasl-quonix.sasl.smtp.pobox.com (Postfix) with ESMTP id AD73E229A7; Thu, 25 Jun 2009 05:08:42 -0400 (EDT) Received: from mappit (unknown [80.45.95.114]) (using TLSv1 with cipher AES256-SHA (256/256 bits)) (No client certificate requested) by a-sasl-quonix.sasl.smtp.pobox.com (Postfix) with ESMTPSA id 3956F229A6; Thu, 25 Jun 2009 05:08:41 -0400 (EDT) Received: from brian by mappit with local (Exim 4.69) (envelope-from ) id 1MJkwh-0001xv-Jp; Thu, 25 Jun 2009 10:08:39 +0100 Date: Thu, 25 Jun 2009 10:08:39 +0100 From: Brian Candler To: hhsuper Cc: user@couchdb.apache.org Subject: Re: 'Grouping' documents so that a set of documents is passed to the view function Message-ID: <20090625090839.GA7316@uk.tiscali.com> References: <69a992ac0906220649w539baba5hff01b0ab8b841da2@mail.gmail.com> <20090622190530.GA18055@uk.tiscali.com> <69a992ac0906221815h1b61e0e2r9f3ef1fe247727a9@mail.gmail.com> <20090623075940.GA6643@uk.tiscali.com> <69a992ac0906240335n1a6ec400we8e7831f0b46947d@mail.gmail.com> <20090624154301.GA27351@uk.tiscali.com> <69a992ac0906241824n32d1f6a5r334346db02d4a065@mail.gmail.com> <20090625083431.GA7022@uk.tiscali.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20090625083431.GA7022@uk.tiscali.com> User-Agent: Mutt/1.5.17+20080114 (2008-01-14) X-Pobox-Relay-ID: C7A96C06-6167-11DE-9BF8-DC021A496417-28021239!a-sasl-quonix.pobox.com X-Virus-Checked: Checked by ClamAV on apache.org On Thu, Jun 25, 2009 at 09:34:31AM +0100, Brian Candler wrote: > Perhaps it will help you to understand this if you consider the limiting > case where exactly one document is fed into the 'reduce' function at a time, > and then the outputs of the reduce functions are combined with a large > re-reduce phase. Incidentally, this is a partly realistic scenario. It's quite possible given N documents that couchdb will reduce the first N-1, then reduce the last 1, then re-reduce those two values. This might be because of how the documents are split between Btree nodes, or there may be a limit on the number of documents passed to the reduce function in one go. This is entirely an implementation issue which you have no control over, so you must write your reduce/rereduce to give the same answer for *any* partitioning of documents. More info at http://wiki.apache.org/couchdb/Introduction_to_CouchDB_views "To make incremental Map/Reduce possible, the Reduce function has the requirement that not only must it be referentially transparent, but it must also be commutative and associative for the array value input, to be able reduce on its own output and get the same answer, like this: f(Key, Values) == f(Key, [ f(Key, Values) ] )" Now, at first glance your re-reduce function appears to satisfy that condition, so perhaps there should be another one: namely, that for any partitioning of Values into subsets Values1, Values2, ... then f(Key, Values) == f(Key, [ f(Key,Values1), f(Key,Values2), ... ] ) But I am not a mathematician so I'm not sure if this condition is actually stronger. Regards, Brian.