couchdb-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Justin Balthrop <jus...@geni.com>
Subject Re: multi-level views
Date Wed, 03 Jun 2009 19:03:07 GMT
Nice! That sounds like exactly what I'm looking for. I don't think it  
will address the performance issues with reduce, but it's definitely a  
start.

Do you mind sending a diff of your changes to couch_view_updater.erl?  
I diffed your file with trunk and there are a bunch of unrelated  
changes, of course.

Thanks


On Jun 3, 2009, at 1:42 AM, Viacheslav Seledkin wrote:

> Justin Balthrop wrote:
>> Hi everyone,
>>
>> I've been reading the dev and user mailing lists for the past month  
>> or
>> so, but haven't posted yet. I've fallen in love with couchdb, its
>> power and simplicity, and I tell everyone who will listen why it is  
>> so
>> much better than a relational db for most applications. I now have
>> most of the engineering team at our company on board, and I'm in the
>> process of converting our rails site from postgres to couchdb.
>>
>> So, after spending a few weeks converting models over to using
>> couchdb, there is one feature that we are desperately missing:
>>
>> Multi-level map-reduce in views.
>>
>> We need a way to take the output of reduce and pass it back through
>> another map-reduce step (multiple times in some cases). This way, we
>> could build map-reduce flows that compute (and cache) any complex  
>> data
>> computation we need.
>>
>> Our specific use case isn't incredibly important, because multi-level
>> map-reduce could be useful in countless ways, but I'll include it
>> anyway just as illustration. The specific need for us arose from the
>> desire to slice up certain very large documents to make concurrent
>> editing by a huge number of users feasible. Then we started to use a
>> view step to combine the data back into whole documents. This worked
>> really well at first, but we soon found that we needed to run
>> additional queries on those documents. So we were stuck with either:
>>
>> 1) do the queries in the client - meaning we lose all the power and
>> caching of couchdb views; or
>> 2) reinsert the combined documents into another database - meaning we
>> are storing the data twice, and we still have to deal with contention
>> when modifying the compound documents in that database.
>>
>> Multi-level map-reduce would solve this problem perfectly!
>>
>> Multi-level views could also simplify and improve performance for
>> reduce grouping. The reduce itself would work just like Google's map-
>> reduce by only reducing values that have the exact same map key. Then
>> if you want to reduce further, you can just use another map-reduce
>> step on top of that with the map emitting a different key so the
>> reduce data will be grouped differently. For example, if you wanted a
>> count of posts per user and total posts, you would implement it as a
>> two-level map-reduce with the key=user_id for map1 and the key=null
>> for map2.
>>
>> This way, you only calculate reduce values for groupings you care
>> about, and any particular reduce value is immediately available from
>> the cached B+tree values without further computation. There is more
>> burden on the user to specify ahead of time which groupings they  
>> need,
>> but the performance and flexibility would be well worth it. This
>> eliminates the need to store reduce values internally in the map B
>> +tree. But it does mean that you would need a B+tree for each reduce
>> grouping to keep incremental reduce updates fast. The improved
>> performance comes from the fact that view queries would never need to
>> aggregate reduce values across multiple nodes or do any re-reducing.
>>
>> Does this make sense? What do you guys think? Have you discussed the
>> possibility of such a feature?
>>
>> I'd be happy to discuss it further and even help with the
>> implementation, though I've only done a little bit of coding in
>> Erlang. I'm pretty sure this would mean big changes to the couchdb
>> internals, so I want to get your opinions and criticisms before I get
>> my hopes up or dive into any coding.
>>
>> Cheers,
>> Justin Balthrop
>>
>> .
>>
>>
> Possible solution, I use it in my production ...
> https://issues.apache.org/jira/browse/COUCHDB-249


Mime
View raw message