Return-Path: Delivered-To: apmail-couchdb-dev-archive@www.apache.org Received: (qmail 13534 invoked from network); 3 Jun 2009 02:20:35 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 3 Jun 2009 02:20:35 -0000 Received: (qmail 64084 invoked by uid 500); 3 Jun 2009 02:20:46 -0000 Delivered-To: apmail-couchdb-dev-archive@couchdb.apache.org Received: (qmail 63998 invoked by uid 500); 3 Jun 2009 02:20:46 -0000 Mailing-List: contact dev-help@couchdb.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@couchdb.apache.org Delivered-To: mailing list dev@couchdb.apache.org Received: (qmail 63974 invoked by uid 99); 3 Jun 2009 02:20:46 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 03 Jun 2009 02:20:46 +0000 X-ASF-Spam-Status: No, hits=-0.0 required=10.0 tests=SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of justin@geni.com designates 208.78.87.26 as permitted sender) Received: from [208.78.87.26] (HELO zimbra-001.geni.com) (208.78.87.26) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 03 Jun 2009 02:20:38 +0000 Received: from localhost (localhost.localdomain [127.0.0.1]) by zimbra-001.geni.com (Postfix) with ESMTP id 938BFB35991 for ; Tue, 2 Jun 2009 19:20:15 -0700 (PDT) X-Virus-Scanned: amavisd-new at X-Spam-Score: -4.224 X-Spam-Level: Received: from zimbra-001.geni.com ([127.0.0.1]) by localhost (zimbra-001.geni.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 5CPBf9Ddytid for ; Tue, 2 Jun 2009 19:20:15 -0700 (PDT) Received: from justin.geni.com (justin.geni.com [192.168.1.205]) by zimbra-001.geni.com (Postfix) with ESMTP id 2F4AEB35801 for ; Tue, 2 Jun 2009 19:20:15 -0700 (PDT) Message-Id: <1D7D2ECC-A0F7-4E04-B1FA-299132A1B1B3@geni.com> From: Justin Balthrop To: dev@couchdb.apache.org Content-Type: text/plain; charset=US-ASCII; format=flowed; delsp=yes Content-Transfer-Encoding: 7bit Mime-Version: 1.0 (Apple Message framework v935.3) Subject: multi-level views Date: Tue, 2 Jun 2009 19:20:14 -0700 X-Mailer: Apple Mail (2.935.3) X-Virus-Checked: Checked by ClamAV on apache.org X-Old-Spam-Flag: NO X-Old-Spam-Status: No, score=-4.224 tagged_above=-10 required=6.6 tests=[ALL_TRUSTED=-1.8, AWL=0.175, BAYES_00=-2.599] Hi everyone, I've been reading the dev and user mailing lists for the past month or so, but haven't posted yet. I've fallen in love with couchdb, its power and simplicity, and I tell everyone who will listen why it is so much better than a relational db for most applications. I now have most of the engineering team at our company on board, and I'm in the process of converting our rails site from postgres to couchdb. So, after spending a few weeks converting models over to using couchdb, there is one feature that we are desperately missing: Multi-level map-reduce in views. We need a way to take the output of reduce and pass it back through another map-reduce step (multiple times in some cases). This way, we could build map-reduce flows that compute (and cache) any complex data computation we need. Our specific use case isn't incredibly important, because multi-level map-reduce could be useful in countless ways, but I'll include it anyway just as illustration. The specific need for us arose from the desire to slice up certain very large documents to make concurrent editing by a huge number of users feasible. Then we started to use a view step to combine the data back into whole documents. This worked really well at first, but we soon found that we needed to run additional queries on those documents. So we were stuck with either: 1) do the queries in the client - meaning we lose all the power and caching of couchdb views; or 2) reinsert the combined documents into another database - meaning we are storing the data twice, and we still have to deal with contention when modifying the compound documents in that database. Multi-level map-reduce would solve this problem perfectly! Multi-level views could also simplify and improve performance for reduce grouping. The reduce itself would work just like Google's map- reduce by only reducing values that have the exact same map key. Then if you want to reduce further, you can just use another map-reduce step on top of that with the map emitting a different key so the reduce data will be grouped differently. For example, if you wanted a count of posts per user and total posts, you would implement it as a two-level map-reduce with the key=user_id for map1 and the key=null for map2. This way, you only calculate reduce values for groupings you care about, and any particular reduce value is immediately available from the cached B+tree values without further computation. There is more burden on the user to specify ahead of time which groupings they need, but the performance and flexibility would be well worth it. This eliminates the need to store reduce values internally in the map B +tree. But it does mean that you would need a B+tree for each reduce grouping to keep incremental reduce updates fast. The improved performance comes from the fact that view queries would never need to aggregate reduce values across multiple nodes or do any re-reducing. Does this make sense? What do you guys think? Have you discussed the possibility of such a feature? I'd be happy to discuss it further and even help with the implementation, though I've only done a little bit of coding in Erlang. I'm pretty sure this would mean big changes to the couchdb internals, so I want to get your opinions and criticisms before I get my hopes up or dive into any coding. Cheers, Justin Balthrop