couchdb-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Adam Kocoloski <kocol...@apache.org>
Subject Re: Chained MapReduce Jobs - again
Date Thu, 07 Oct 2010 15:06:03 GMT
It's not in BigCouch, but that's primarily because chainable map-reduce views are orthogonal
to the core BigCouch stuff.

The approach that we took at Cloudant is as follows: a view definition can have a field called
"dbcopy" in addition to the usual "map" and "reduce" fields.  The value of "dbcopy" is a string
specifying a database name used to store the view results.  When a dbcopy view is updated,
the group=true reduce rows in that view are stored as separate documents in the dbcopy database.
 The ID of each document is deterministically generated from the key of the view row.  This
allows the documents to be updated incrementally as the view is updated, including the removal
of documents corresponding to view rows which have been removed.  Each document contains "key"
and "value" fields, as well as some additional information about the individual primary database
shard contributions to that reduce value.

One caveat is that the view must have a reduce function - if "dbcopy" is specified with a
map-only view it's simply ignored.  I suppose it would be possible to have a more flexible
setup in which map-only views could be persisted, although I think most of the things one
might want to accomplish by saving map-only views in a separate database can be accomplished
with a single map function.  The real power comes when reduce functions are invlolved, e.g.
the classic tag-count-sorted-by-frequency problem.

We're interested in contributing this code if the design meets with the other committers'
approval.  We also want to make sure that it scales well to BigCouch clusters.  Our current
design in that department works, but is not very relaxed - it relies on global locks in the
distributed Erlang setup to ensure that only one source view index shard is updating a dbcopy
database shard at any given time.

Adam

On Oct 5, 2010, at 1:51 PM, Zachary Zolton wrote:

> I know that Cloudant has such a feature, however I'm not sure if it
> has made it into their BigCouch distribution.
> 
> If in the future CouchDB core decides to merge in functionality from
> BigCouch, then CouchDB could grow a chainable map-reduce.
> 
> On Tue, Oct 5, 2010 at 12:45 PM, Hendrijk Templehov
> <nuzux05@googlemail.com> wrote:
>> Hi there,
>> 
>> in Sep 2008 I asked on this list, if it is possible to have chained
>> Map Reduce Queries in CouchDB.
>> The answer was no, unfortunatelly.
>> 
>> Meanwhile, I read that many users of CouchDB want to have such a
>> feature. So, I want to ask what the current status of discussion about
>> this point is in the community.
>> Will it be implemented in future releases?
>> 
>> 
>> Thanks
>> Hendrijk
>> 


Mime
View raw message