Return-Path: Delivered-To: apmail-couchdb-user-archive@www.apache.org Received: (qmail 68412 invoked from network); 22 Jun 2009 09:08:00 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 22 Jun 2009 09:08:00 -0000 Received: (qmail 40376 invoked by uid 500); 22 Jun 2009 09:08:10 -0000 Delivered-To: apmail-couchdb-user-archive@couchdb.apache.org Received: (qmail 40295 invoked by uid 500); 22 Jun 2009 09:08:10 -0000 Mailing-List: contact user-help@couchdb.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@couchdb.apache.org Delivered-To: mailing list user@couchdb.apache.org Received: (qmail 40281 invoked by uid 99); 22 Jun 2009 09:08:09 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 22 Jun 2009 09:08:09 +0000 X-ASF-Spam-Status: No, hits=-0.0 required=10.0 tests=SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of b.candler@pobox.com designates 208.72.237.25 as permitted sender) Received: from [208.72.237.25] (HELO sasl.smtp.pobox.com) (208.72.237.25) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 22 Jun 2009 09:08:00 +0000 Received: from localhost.localdomain (unknown [127.0.0.1]) by a-sasl-quonix.sasl.smtp.pobox.com (Postfix) with ESMTP id 72F7B21F91; Mon, 22 Jun 2009 05:07:38 -0400 (EDT) Received: from mappit (unknown [80.45.95.114]) (using TLSv1 with cipher AES256-SHA (256/256 bits)) (No client certificate requested) by a-sasl-quonix.sasl.smtp.pobox.com (Postfix) with ESMTPSA id 0427121F8F; Mon, 22 Jun 2009 05:07:36 -0400 (EDT) Received: from brian by mappit with local (Exim 4.69) (envelope-from ) id 1MIfV1-0002IT-KG; Mon, 22 Jun 2009 10:07:35 +0100 Date: Mon, 22 Jun 2009 10:07:35 +0100 From: Brian Candler To: Daniel =?iso-8859-1?Q?Tr=FCmper?= Cc: user@couchdb.apache.org Subject: Re: 'Grouping' documents so that a set of documents is passed to the view function Message-ID: <20090622090735.GB8538@uk.tiscali.com> References: <234B2543-875F-47DB-B870-B583D2E2B3B7@googlemail.com> MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: <234B2543-875F-47DB-B870-B583D2E2B3B7@googlemail.com> User-Agent: Mutt/1.5.17+20080114 (2008-01-14) X-Pobox-Relay-ID: 22220D72-5F0C-11DE-BE61-B5D1A546830D-28021239!a-sasl-quonix.pobox.com X-Virus-Checked: Checked by ClamAV on apache.org On Fri, Jun 19, 2009 at 09:43:31AM +0200, Daniel Tr�mper wrote: > Hi, > > I am somewhat new to CouchDB but have been doing some stuff with it and > this is my first post to the list so pardon if I am wrong :) > > >> It would be really cool if there were some way to pass all the docs >> with a value of 1 for group_key to a single map function call, so I >> could do computation across those related documents and emit the >> results ... I'm just using the magic group_key attribute as an >> example, if such a feature were to actually be made I'd think you'd >> define a javascript function which returned a single groupping k to >> exist I > I think this is what the reduce function is for. No, I'm afraid it's not. The OP wants to calculate information across a group of related documents. CouchDB does not guarantee that all the related documents will be passed to the reduce function at the same time. It may pass documents (d1,d2,d3) to the reduce function to generate Rx, then pass (d4,d5,d6) to the reduce function to generate Ry, then (d7,d8,d9) to generate Rz, then pass (Rx,Ry,Rz) to the re-reduce function to generate the final R value. If the values sharing the key were e.g. d3,d4 then you won't be able to process them together, as they would not be presented to the reduce function at the same time. Using a grouped reduce query is better (i.e. group=true), but a large set of documents sharing the same group key are still likely to be split into several reductions with a re-reduce. The OP was talking about ~100 documents sharing this key, and so they may well be split this way. Furthermore, CouchDB optimises its reductions by storing the reduced value for all the documents within the same Btree node. For example, suppose you have +-------------+ +-------------+ +-------------+ | d1 d2 d3 Rx | | d4 d5 d6 Ry | | d7 d8 d9 Rz | +-------------+ +-------------+ +-------------+ Then you make a reduce query for the key range which includes documents d2 to d8 inclusive (or a grouped query where d2 to d8 share the same group key). CouchDB will calculate: R1 = Reduce(d2,d3) R2 = Reduce(d7,d8) R = Rereduce(R1,Ry,R2) That is: the already-reduced value of Ry=Reduce(d4,d5,d6) is reused without recomputation. So the reduce function doesn't see documents d4 to d6 again. So in summary: you cannot rely on the reduce function to be able to process adjacent documents. You *must* do this sort of processing client-side. HTH, Brian.