couchdb-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Zachary Zolton <zachary.zol...@gmail.com>
Subject Re: multi-level views
Date Wed, 03 Jun 2009 20:09:48 GMT
Yeah... I was gonna recommend Cascade, but I haven't seen any movement
on Github for quite a while!

Perhaps Paul Davis would like to chime in...? :^q

I've been using an Update Notifier script for this kinda thing so
far—also, not incrementally—but it's worked well enough for my needs.
My primary desired would be to do this in a manner such that the
application code doesn't need to know about the second database.

On Wed, Jun 3, 2009 at 2:29 PM, Chris Anderson <jchris@apache.org> wrote:
> On Wed, Jun 3, 2009 at 12:03 PM, Justin Balthrop <justin@geni.com> wrote:
>> Nice! That sounds like exactly what I'm looking for. I don't think it will
>> address the performance issues with reduce, but it's definitely a start.
>>
>> Do you mind sending a diff of your changes to couch_view_updater.erl? I
>> diffed your file with trunk and there are a bunch of unrelated changes, of
>> course.
>
> There's also a Paul Davis's Cascade:
> http://github.com/davisp/cascade/tree/master
>
> I'm planning on writing something with Hovercraft that takes a group
> reduce query and copies it to another database on demand. It wouldn't
> try to be incremental, just provide for easy chaining.
>
> I think chaining by copying to a db is a good way to work, because it
> lets you experiment with other views on top of your reduce rows,
> without regenerating the whole thing.
>
> Chris
>
>>
>> Thanks
>>
>>
>> On Jun 3, 2009, at 1:42 AM, Viacheslav Seledkin wrote:
>>
>>> Justin Balthrop wrote:
>>>>
>>>> Hi everyone,
>>>>
>>>> I've been reading the dev and user mailing lists for the past month or
>>>> so, but haven't posted yet. I've fallen in love with couchdb, its
>>>> power and simplicity, and I tell everyone who will listen why it is so
>>>> much better than a relational db for most applications. I now have
>>>> most of the engineering team at our company on board, and I'm in the
>>>> process of converting our rails site from postgres to couchdb.
>>>>
>>>> So, after spending a few weeks converting models over to using
>>>> couchdb, there is one feature that we are desperately missing:
>>>>
>>>> Multi-level map-reduce in views.
>>>>
>>>> We need a way to take the output of reduce and pass it back through
>>>> another map-reduce step (multiple times in some cases). This way, we
>>>> could build map-reduce flows that compute (and cache) any complex data
>>>> computation we need.
>>>>
>>>> Our specific use case isn't incredibly important, because multi-level
>>>> map-reduce could be useful in countless ways, but I'll include it
>>>> anyway just as illustration. The specific need for us arose from the
>>>> desire to slice up certain very large documents to make concurrent
>>>> editing by a huge number of users feasible. Then we started to use a
>>>> view step to combine the data back into whole documents. This worked
>>>> really well at first, but we soon found that we needed to run
>>>> additional queries on those documents. So we were stuck with either:
>>>>
>>>> 1) do the queries in the client - meaning we lose all the power and
>>>> caching of couchdb views; or
>>>> 2) reinsert the combined documents into another database - meaning we
>>>> are storing the data twice, and we still have to deal with contention
>>>> when modifying the compound documents in that database.
>>>>
>>>> Multi-level map-reduce would solve this problem perfectly!
>>>>
>>>> Multi-level views could also simplify and improve performance for
>>>> reduce grouping. The reduce itself would work just like Google's map-
>>>> reduce by only reducing values that have the exact same map key. Then
>>>> if you want to reduce further, you can just use another map-reduce
>>>> step on top of that with the map emitting a different key so the
>>>> reduce data will be grouped differently. For example, if you wanted a
>>>> count of posts per user and total posts, you would implement it as a
>>>> two-level map-reduce with the key=user_id for map1 and the key=null
>>>> for map2.
>>>>
>>>> This way, you only calculate reduce values for groupings you care
>>>> about, and any particular reduce value is immediately available from
>>>> the cached B+tree values without further computation. There is more
>>>> burden on the user to specify ahead of time which groupings they need,
>>>> but the performance and flexibility would be well worth it. This
>>>> eliminates the need to store reduce values internally in the map B
>>>> +tree. But it does mean that you would need a B+tree for each reduce
>>>> grouping to keep incremental reduce updates fast. The improved
>>>> performance comes from the fact that view queries would never need to
>>>> aggregate reduce values across multiple nodes or do any re-reducing.
>>>>
>>>> Does this make sense? What do you guys think? Have you discussed the
>>>> possibility of such a feature?
>>>>
>>>> I'd be happy to discuss it further and even help with the
>>>> implementation, though I've only done a little bit of coding in
>>>> Erlang. I'm pretty sure this would mean big changes to the couchdb
>>>> internals, so I want to get your opinions and criticisms before I get
>>>> my hopes up or dive into any coding.
>>>>
>>>> Cheers,
>>>> Justin Balthrop
>>>>
>>>> .
>>>>
>>>>
>>> Possible solution, I use it in my production ...
>>> https://issues.apache.org/jira/browse/COUCHDB-249
>>
>>
>
>
>
> --
> Chris Anderson
> http://jchrisa.net
> http://couch.io
>

Mime
View raw message