incubator-couchdb-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Chris Anderson <jch...@apache.org>
Subject Re: multi-level views
Date Thu, 04 Jun 2009 17:32:24 GMT
On Thu, Jun 4, 2009 at 2:49 AM, Viacheslav Seledkin
<viacheslav.seledkin@avicomp.com> wrote:
> Chris Anderson wrote:
>>
>> On Wed, Jun 3, 2009 at 12:03 PM, Justin Balthrop <justin@geni.com> wrote:
>>
>>>
>>> Nice! That sounds like exactly what I'm looking for. I don't think it
>>> will
>>> address the performance issues with reduce, but it's definitely a start.
>>>
>>> Do you mind sending a diff of your changes to couch_view_updater.erl? I
>>> diffed your file with trunk and there are a bunch of unrelated changes,
>>> of
>>> course.
>>>
>>
>> There's also a Paul Davis's Cascade:
>> http://github.com/davisp/cascade/tree/master
>>
>> I'm planning on writing something with Hovercraft that takes a group
>> reduce query and copies it to another database on demand. It wouldn't
>> try to be incremental, just provide for easy chaining.
>>
>> I think chaining by copying to a db is a good way to work, because it
>> lets you experiment with other views on top of your reduce rows,
>> without regenerating the whole thing.
>>
>> Chris
>>
>>
>>>
>>> Thanks
>>>
>>>
>>> On Jun 3, 2009, at 1:42 AM, Viacheslav Seledkin wrote:
>>>
>>>
>>>>
>>>> Justin Balthrop wrote:
>>>>
>>>>>
>>>>> Hi everyone,
>>>>>
>>>>> I've been reading the dev and user mailing lists for the past month or
>>>>> so, but haven't posted yet. I've fallen in love with couchdb, its
>>>>> power and simplicity, and I tell everyone who will listen why it is so
>>>>> much better than a relational db for most applications. I now have
>>>>> most of the engineering team at our company on board, and I'm in the
>>>>> process of converting our rails site from postgres to couchdb.
>>>>>
>>>>> So, after spending a few weeks converting models over to using
>>>>> couchdb, there is one feature that we are desperately missing:
>>>>>
>>>>> Multi-level map-reduce in views.
>>>>>
>>>>> We need a way to take the output of reduce and pass it back through
>>>>> another map-reduce step (multiple times in some cases). This way, we
>>>>> could build map-reduce flows that compute (and cache) any complex data
>>>>> computation we need.
>>>>>
>>>>> Our specific use case isn't incredibly important, because multi-level
>>>>> map-reduce could be useful in countless ways, but I'll include it
>>>>> anyway just as illustration. The specific need for us arose from the
>>>>> desire to slice up certain very large documents to make concurrent
>>>>> editing by a huge number of users feasible. Then we started to use a
>>>>> view step to combine the data back into whole documents. This worked
>>>>> really well at first, but we soon found that we needed to run
>>>>> additional queries on those documents. So we were stuck with either:
>>>>>
>>>>> 1) do the queries in the client - meaning we lose all the power and
>>>>> caching of couchdb views; or
>>>>> 2) reinsert the combined documents into another database - meaning we
>>>>> are storing the data twice, and we still have to deal with contention
>>>>> when modifying the compound documents in that database.
>>>>>
>>>>> Multi-level map-reduce would solve this problem perfectly!
>>>>>
>>>>> Multi-level views could also simplify and improve performance for
>>>>> reduce grouping. The reduce itself would work just like Google's map-
>>>>> reduce by only reducing values that have the exact same map key. Then
>>>>> if you want to reduce further, you can just use another map-reduce
>>>>> step on top of that with the map emitting a different key so the
>>>>> reduce data will be grouped differently. For example, if you wanted a
>>>>> count of posts per user and total posts, you would implement it as a
>>>>> two-level map-reduce with the key=user_id for map1 and the key=null
>>>>> for map2.
>>>>>
>>>>> This way, you only calculate reduce values for groupings you care
>>>>> about, and any particular reduce value is immediately available from
>>>>> the cached B+tree values without further computation. There is more
>>>>> burden on the user to specify ahead of time which groupings they need,
>>>>> but the performance and flexibility would be well worth it. This
>>>>> eliminates the need to store reduce values internally in the map B
>>>>> +tree. But it does mean that you would need a B+tree for each reduce
>>>>> grouping to keep incremental reduce updates fast. The improved
>>>>> performance comes from the fact that view queries would never need to
>>>>> aggregate reduce values across multiple nodes or do any re-reducing.
>>>>>
>>>>> Does this make sense? What do you guys think? Have you discussed the
>>>>> possibility of such a feature?
>>>>>
>>>>> I'd be happy to discuss it further and even help with the
>>>>> implementation, though I've only done a little bit of coding in
>>>>> Erlang. I'm pretty sure this would mean big changes to the couchdb
>>>>> internals, so I want to get your opinions and criticisms before I get
>>>>> my hopes up or dive into any coding.
>>>>>
>>>>> Cheers,
>>>>> Justin Balthrop
>>>>>
>>>>> .
>>>>>
>>>>>
>>>>>
>>>>
>>>> Possible solution, I use it in my production ...
>>>> https://issues.apache.org/jira/browse/COUCHDB-249
>>>>
>>>
>>>
>>
>>
>>
>> --
>> Chris Anderson
>> http://jchrisa.net
>> http://couch.io
>>
>> .
>>
>>
>
> Attached patch to support multilevel views
>

This works afaik, but the style of updating the results database
during view updates is a little compliex. The code I'm working on
should be just a few lines concentrated in a couple of functions.

> https://issues.apache.org/jira/browse/COUCHDB-249
>
>



-- 
Chris Anderson
http://jchrisa.net
http://couch.io

Mime
View raw message