Return-Path: Delivered-To: apmail-couchdb-dev-archive@www.apache.org Received: (qmail 73537 invoked from network); 4 Jun 2009 05:43:40 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 4 Jun 2009 05:43:40 -0000 Received: (qmail 52108 invoked by uid 500); 4 Jun 2009 05:43:52 -0000 Delivered-To: apmail-couchdb-dev-archive@couchdb.apache.org Received: (qmail 52025 invoked by uid 500); 4 Jun 2009 05:43:51 -0000 Mailing-List: contact dev-help@couchdb.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@couchdb.apache.org Delivered-To: mailing list dev@couchdb.apache.org Received: (qmail 52015 invoked by uid 99); 4 Jun 2009 05:43:51 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 04 Jun 2009 05:43:51 +0000 X-ASF-Spam-Status: No, hits=-0.0 required=10.0 tests=SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of stephanwehner@gmail.com designates 74.125.46.29 as permitted sender) Received: from [74.125.46.29] (HELO yw-out-2324.google.com) (74.125.46.29) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 04 Jun 2009 05:43:42 +0000 Received: by yw-out-2324.google.com with SMTP id 2so247024ywt.5 for ; Wed, 03 Jun 2009 22:43:21 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:received:in-reply-to:references :date:message-id:subject:from:to:content-type :content-transfer-encoding; bh=vO7lICLSsUQdE5MMeMaBTjfo3a900fJ8cm19ZCvxsA4=; b=US0OFoGqmbJS5jEln+aEeF0LzxU/n0ZJ1rO25u83NhrE1Lub9tlC2p/21+nLRRwYJH pJOvlBNVl2Qihx3Fyo6HVSQaVeGg8YW+uiBGKQRSzmIQDbNV0XrkCrpJmHdrSUDf2df0 6CNF+8am0NIgpkjVUJNRdRs6x9WHd/qSEDLb0= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type:content-transfer-encoding; b=NNUeTYg838a4apFW63FZdASozWFADblcXkrVJPPj0bMY4SR3p63VLclu2pWfF1u4UW 43iKlZ83zOtYBvJuOX4S3nO7b+UDXVe+phgbh5ZCFhTBRwupFrA8byatVtyjVy6JgMhO HRQjh/6nVSt5UNTI0mx559UfjqFdxzup8fTuE= MIME-Version: 1.0 Received: by 10.151.125.11 with SMTP id c11mr2807051ybn.138.1244094201644; Wed, 03 Jun 2009 22:43:21 -0700 (PDT) In-Reply-To: <1D7D2ECC-A0F7-4E04-B1FA-299132A1B1B3@geni.com> References: <1D7D2ECC-A0F7-4E04-B1FA-299132A1B1B3@geni.com> Date: Wed, 3 Jun 2009 22:43:21 -0700 Message-ID: Subject: Re: multi-level views From: Stephan Wehner To: dev@couchdb.apache.org Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit X-Virus-Checked: Checked by ClamAV on apache.org On Tue, Jun 2, 2009 at 7:20 PM, Justin Balthrop wrote: > Hi everyone, > > I've been reading the dev and user mailing lists for the past month or so, > but haven't posted yet. I've fallen in love with couchdb, its power and > simplicity, and I tell everyone who will listen why it is so much better > than a relational db for most applications. I now have most of the > engineering team at our company on board, and I'm in the process of > converting our rails site from postgres to couchdb. > > So, after spending a few weeks converting models over to using couchdb, > there is one feature that we are desperately missing: > > Multi-level map-reduce in views. > > We need a way to take the output of reduce and pass it back through another > map-reduce step (multiple times in some cases). This way, we could build > map-reduce flows that compute (and cache) any complex data computation we > need. > > Our specific use case isn't incredibly important, because multi-level > map-reduce could be useful in countless ways, but I'll include it anyway > just as illustration. The specific need for us arose from the desire to > slice up certain very large documents to make concurrent editing by a huge > number of users feasible. Then we started to use a view step to combine the > data back into whole documents. This worked really well at first, but we > soon found that we needed to run additional queries on those documents. So > we were stuck with either: Hey there, Would you mind explaining what those additional queries are? Stephan > > 1) do the queries in the client - meaning we lose all the power and caching > of couchdb views; or > 2) reinsert the combined documents into another database - meaning we are > storing the data twice, and we still have to deal with contention when > modifying the compound documents in that database. > > Multi-level map-reduce would solve this problem perfectly! > > Multi-level views could also simplify and improve performance for reduce > grouping. The reduce itself would work just like Google's map-reduce by only > reducing values that have the exact same map key. Then if you want to reduce > further, you can just use another map-reduce step on top of that with the > map emitting a different key so the reduce data will be grouped differently. > For example, if you wanted a count of posts per user and total posts, you > would implement it as a two-level map-reduce with the key=user_id for map1 > and the key=null for map2. > > This way, you only calculate reduce values for groupings you care about, and > any particular reduce value is immediately available from the cached B+tree > values without further computation. There is more burden on the user to > specify ahead of time which groupings they need, but the performance and > flexibility would be well worth it. This eliminates the need to store reduce > values internally in the map B+tree. But it does mean that you would need a > B+tree for each reduce grouping to keep incremental reduce updates fast. The > improved performance comes from the fact that view queries would never need > to aggregate reduce values across multiple nodes or do any re-reducing. > > Does this make sense? What do you guys think? Have you discussed the > possibility of such a feature? > > I'd be happy to discuss it further and even help with the implementation, > though I've only done a little bit of coding in Erlang. I'm pretty sure this > would mean big changes to the couchdb internals, so I want to get your > opinions and criticisms before I get my hopes up or dive into any coding. > > Cheers, > Justin Balthrop > -- Stephan Wehner -> http://stephan.sugarmotor.org (blog and homepage) -> http://www.thrackle.org -> http://www.buckmaster.ca -> http://www.trafficlife.com -> http://stephansmap.org -- http://blog.stephansmap.org