Return-Path: Delivered-To: apmail-couchdb-dev-archive@www.apache.org Received: (qmail 56282 invoked from network); 4 Jun 2009 09:54:13 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 4 Jun 2009 09:54:13 -0000 Received: (qmail 11700 invoked by uid 500); 4 Jun 2009 09:54:25 -0000 Delivered-To: apmail-couchdb-dev-archive@couchdb.apache.org Received: (qmail 11608 invoked by uid 500); 4 Jun 2009 09:54:25 -0000 Mailing-List: contact dev-help@couchdb.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@couchdb.apache.org Delivered-To: mailing list dev@couchdb.apache.org Received: (qmail 11597 invoked by uid 99); 4 Jun 2009 09:54:25 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 04 Jun 2009 09:54:25 +0000 X-ASF-Spam-Status: No, hits=-0.0 required=10.0 tests=SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: local policy) Received: from [62.117.115.98] (HELO ns.avicomp.com) (62.117.115.98) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 04 Jun 2009 09:54:13 +0000 Received: from fs03.avicomp.com (fs03.avicomp.com [10.9.0.15]) by ns.avicomp.com (8.13.7/8.12.5) with ESMTP id n549ro2C007264; Thu, 4 Jun 2009 13:53:51 +0400 Received: from [192.168.3.7] (192.168.3.7) by fs03.avicomp.com (10.9.0.15) with Microsoft SMTP Server id 8.1.263.0; Thu, 4 Jun 2009 13:53:50 +0400 Message-ID: <4A2798BF.8070407@avicomp.com> Date: Thu, 4 Jun 2009 13:49:51 +0400 From: Viacheslav Seledkin User-Agent: Thunderbird 2.0.0.21 (X11/20090409) MIME-Version: 1.0 To: "dev@couchdb.apache.org" CC: Subject: Re: multi-level views References: <1D7D2ECC-A0F7-4E04-B1FA-299132A1B1B3@geni.com> <4A263772.50204@avicomp.com> In-Reply-To: Content-Type: text/plain; charset="ISO-8859-1"; format=flowed Content-Transfer-Encoding: 7bit X-Virus-Checked: Checked by ClamAV on apache.org Chris Anderson wrote: > On Wed, Jun 3, 2009 at 12:03 PM, Justin Balthrop wrote: > >> Nice! That sounds like exactly what I'm looking for. I don't think it will >> address the performance issues with reduce, but it's definitely a start. >> >> Do you mind sending a diff of your changes to couch_view_updater.erl? I >> diffed your file with trunk and there are a bunch of unrelated changes, of >> course. >> > > There's also a Paul Davis's Cascade: > http://github.com/davisp/cascade/tree/master > > I'm planning on writing something with Hovercraft that takes a group > reduce query and copies it to another database on demand. It wouldn't > try to be incremental, just provide for easy chaining. > > I think chaining by copying to a db is a good way to work, because it > lets you experiment with other views on top of your reduce rows, > without regenerating the whole thing. > > Chris > > >> Thanks >> >> >> On Jun 3, 2009, at 1:42 AM, Viacheslav Seledkin wrote: >> >> >>> Justin Balthrop wrote: >>> >>>> Hi everyone, >>>> >>>> I've been reading the dev and user mailing lists for the past month or >>>> so, but haven't posted yet. I've fallen in love with couchdb, its >>>> power and simplicity, and I tell everyone who will listen why it is so >>>> much better than a relational db for most applications. I now have >>>> most of the engineering team at our company on board, and I'm in the >>>> process of converting our rails site from postgres to couchdb. >>>> >>>> So, after spending a few weeks converting models over to using >>>> couchdb, there is one feature that we are desperately missing: >>>> >>>> Multi-level map-reduce in views. >>>> >>>> We need a way to take the output of reduce and pass it back through >>>> another map-reduce step (multiple times in some cases). This way, we >>>> could build map-reduce flows that compute (and cache) any complex data >>>> computation we need. >>>> >>>> Our specific use case isn't incredibly important, because multi-level >>>> map-reduce could be useful in countless ways, but I'll include it >>>> anyway just as illustration. The specific need for us arose from the >>>> desire to slice up certain very large documents to make concurrent >>>> editing by a huge number of users feasible. Then we started to use a >>>> view step to combine the data back into whole documents. This worked >>>> really well at first, but we soon found that we needed to run >>>> additional queries on those documents. So we were stuck with either: >>>> >>>> 1) do the queries in the client - meaning we lose all the power and >>>> caching of couchdb views; or >>>> 2) reinsert the combined documents into another database - meaning we >>>> are storing the data twice, and we still have to deal with contention >>>> when modifying the compound documents in that database. >>>> >>>> Multi-level map-reduce would solve this problem perfectly! >>>> >>>> Multi-level views could also simplify and improve performance for >>>> reduce grouping. The reduce itself would work just like Google's map- >>>> reduce by only reducing values that have the exact same map key. Then >>>> if you want to reduce further, you can just use another map-reduce >>>> step on top of that with the map emitting a different key so the >>>> reduce data will be grouped differently. For example, if you wanted a >>>> count of posts per user and total posts, you would implement it as a >>>> two-level map-reduce with the key=user_id for map1 and the key=null >>>> for map2. >>>> >>>> This way, you only calculate reduce values for groupings you care >>>> about, and any particular reduce value is immediately available from >>>> the cached B+tree values without further computation. There is more >>>> burden on the user to specify ahead of time which groupings they need, >>>> but the performance and flexibility would be well worth it. This >>>> eliminates the need to store reduce values internally in the map B >>>> +tree. But it does mean that you would need a B+tree for each reduce >>>> grouping to keep incremental reduce updates fast. The improved >>>> performance comes from the fact that view queries would never need to >>>> aggregate reduce values across multiple nodes or do any re-reducing. >>>> >>>> Does this make sense? What do you guys think? Have you discussed the >>>> possibility of such a feature? >>>> >>>> I'd be happy to discuss it further and even help with the >>>> implementation, though I've only done a little bit of coding in >>>> Erlang. I'm pretty sure this would mean big changes to the couchdb >>>> internals, so I want to get your opinions and criticisms before I get >>>> my hopes up or dive into any coding. >>>> >>>> Cheers, >>>> Justin Balthrop >>>> >>>> . >>>> >>>> >>>> >>> Possible solution, I use it in my production ... >>> https://issues.apache.org/jira/browse/COUCHDB-249 >>> >> > > > > -- > Chris Anderson > http://jchrisa.net > http://couch.io > > . > > Attached patch to support multilevel views https://issues.apache.org/jira/browse/COUCHDB-249