Return-Path: Delivered-To: apmail-couchdb-dev-archive@www.apache.org Received: (qmail 58896 invoked from network); 3 Jun 2009 19:03:30 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 3 Jun 2009 19:03:30 -0000 Received: (qmail 5806 invoked by uid 500); 3 Jun 2009 19:03:41 -0000 Delivered-To: apmail-couchdb-dev-archive@couchdb.apache.org Received: (qmail 5740 invoked by uid 500); 3 Jun 2009 19:03:41 -0000 Mailing-List: contact dev-help@couchdb.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@couchdb.apache.org Delivered-To: mailing list dev@couchdb.apache.org Received: (qmail 5730 invoked by uid 99); 3 Jun 2009 19:03:41 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 03 Jun 2009 19:03:41 +0000 X-ASF-Spam-Status: No, hits=-0.0 required=10.0 tests=SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of justin@geni.com designates 208.78.87.26 as permitted sender) Received: from [208.78.87.26] (HELO zimbra-001.geni.com) (208.78.87.26) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 03 Jun 2009 19:03:31 +0000 Received: from localhost (localhost.localdomain [127.0.0.1]) by zimbra-001.geni.com (Postfix) with ESMTP id 7B766B35881; Wed, 3 Jun 2009 12:03:09 -0700 (PDT) X-Virus-Scanned: amavisd-new at X-Spam-Score: -2.499 X-Spam-Level: Received: from zimbra-001.geni.com ([127.0.0.1]) by localhost (zimbra-001.geni.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id TNO7zrwJ4RML; Wed, 3 Jun 2009 12:03:09 -0700 (PDT) Received: from [10.10.2.12] (unknown [10.10.2.12]) by zimbra-001.geni.com (Postfix) with ESMTP id 1D761B3587B; Wed, 3 Jun 2009 12:03:09 -0700 (PDT) Cc: Viacheslav Seledkin Message-Id: From: Justin Balthrop To: dev@couchdb.apache.org In-Reply-To: <4A263772.50204@avicomp.com> Content-Type: text/plain; charset=US-ASCII; format=flowed; delsp=yes Content-Transfer-Encoding: 7bit Mime-Version: 1.0 (Apple Message framework v935.3) Subject: Re: multi-level views Date: Wed, 3 Jun 2009 12:03:07 -0700 References: <1D7D2ECC-A0F7-4E04-B1FA-299132A1B1B3@geni.com> <4A263772.50204@avicomp.com> X-Mailer: Apple Mail (2.935.3) X-Virus-Checked: Checked by ClamAV on apache.org X-Old-Spam-Flag: NO X-Old-Spam-Status: No, score=-2.499 tagged_above=-10 required=6.6 tests=[BAYES_00=-2.599, RDNS_NONE=0.1] Nice! That sounds like exactly what I'm looking for. I don't think it will address the performance issues with reduce, but it's definitely a start. Do you mind sending a diff of your changes to couch_view_updater.erl? I diffed your file with trunk and there are a bunch of unrelated changes, of course. Thanks On Jun 3, 2009, at 1:42 AM, Viacheslav Seledkin wrote: > Justin Balthrop wrote: >> Hi everyone, >> >> I've been reading the dev and user mailing lists for the past month >> or >> so, but haven't posted yet. I've fallen in love with couchdb, its >> power and simplicity, and I tell everyone who will listen why it is >> so >> much better than a relational db for most applications. I now have >> most of the engineering team at our company on board, and I'm in the >> process of converting our rails site from postgres to couchdb. >> >> So, after spending a few weeks converting models over to using >> couchdb, there is one feature that we are desperately missing: >> >> Multi-level map-reduce in views. >> >> We need a way to take the output of reduce and pass it back through >> another map-reduce step (multiple times in some cases). This way, we >> could build map-reduce flows that compute (and cache) any complex >> data >> computation we need. >> >> Our specific use case isn't incredibly important, because multi-level >> map-reduce could be useful in countless ways, but I'll include it >> anyway just as illustration. The specific need for us arose from the >> desire to slice up certain very large documents to make concurrent >> editing by a huge number of users feasible. Then we started to use a >> view step to combine the data back into whole documents. This worked >> really well at first, but we soon found that we needed to run >> additional queries on those documents. So we were stuck with either: >> >> 1) do the queries in the client - meaning we lose all the power and >> caching of couchdb views; or >> 2) reinsert the combined documents into another database - meaning we >> are storing the data twice, and we still have to deal with contention >> when modifying the compound documents in that database. >> >> Multi-level map-reduce would solve this problem perfectly! >> >> Multi-level views could also simplify and improve performance for >> reduce grouping. The reduce itself would work just like Google's map- >> reduce by only reducing values that have the exact same map key. Then >> if you want to reduce further, you can just use another map-reduce >> step on top of that with the map emitting a different key so the >> reduce data will be grouped differently. For example, if you wanted a >> count of posts per user and total posts, you would implement it as a >> two-level map-reduce with the key=user_id for map1 and the key=null >> for map2. >> >> This way, you only calculate reduce values for groupings you care >> about, and any particular reduce value is immediately available from >> the cached B+tree values without further computation. There is more >> burden on the user to specify ahead of time which groupings they >> need, >> but the performance and flexibility would be well worth it. This >> eliminates the need to store reduce values internally in the map B >> +tree. But it does mean that you would need a B+tree for each reduce >> grouping to keep incremental reduce updates fast. The improved >> performance comes from the fact that view queries would never need to >> aggregate reduce values across multiple nodes or do any re-reducing. >> >> Does this make sense? What do you guys think? Have you discussed the >> possibility of such a feature? >> >> I'd be happy to discuss it further and even help with the >> implementation, though I've only done a little bit of coding in >> Erlang. I'm pretty sure this would mean big changes to the couchdb >> internals, so I want to get your opinions and criticisms before I get >> my hopes up or dive into any coding. >> >> Cheers, >> Justin Balthrop >> >> . >> >> > Possible solution, I use it in my production ... > https://issues.apache.org/jira/browse/COUCHDB-249