incubator-couchdb-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From James Marca <jma...@translab.its.uci.edu>
Subject Re: Chaining of views/MapReduce
Date Mon, 22 Feb 2010 18:16:23 GMT
On Fri, Feb 19, 2010 at 10:10:23AM -0500, J Chris Anderson wrote:
> 
> On Feb 17, 2010, at 5:29 PM, Norman Rosner wrote:
> 
> > 
> > On 17.02.2010, at 23:15, Mario Scheliga wrote:
> > 
> >> Hi Norman,
> >> 
> >> updating a document from map-function its not possible and seems to be the wrong
way.
> >> Thinking of map function processing docs seperatly (sandbox), so you are able
to
> >> spread the execution over thousand of servers ;-)
> > 
> > True that! But: suppose I'm just creating/updating one document per couchdb-instance,
that should be ok, right? Because after that, I can easily get all the result documents and
merge them together. I would do it in as similar way in Hadoop. And as far as I read in the
loooong archives of this list, I'm not the only one who wants to do such things. 
> 
> 
> The "proper" way to do this is to have a simple CouchDB map reduce view that is the 1st
phase of your chain.
> 
> Then query the view with group=true and store the output into an empty db (one document
per row).
> 
> Now you can write another view on top of the derived db to do the second phase (sort
by value, etc).

Forgive me in advance, I have no erlang skills and no time or ability
to submit a patch, but I have to ask.  Are there any plans in the
development roadmap to make this less a kludge and more a core
feature?  

I see two problems with the current proper way.  First, it seems
wasteful of disk space to have a view generated and then store
essentially the same thing as a separate db.  Second and more
importantly, as a developer you have to write long-lasting code that
pays attention to the source database to update the chain of
view->db->view->db...->view when the source db data changes.  It would
be nicer if CouchDB could manage all that internally.  Perhaps the map
code could explicitly dump to a db, maybe something like emit_chained
with a required target db as a third argument, so that changes to the
source database can get cascaded automatically.

Regards,
James

-- 
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.


Mime
View raw message