couchdb-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Chris Anderson" <jch...@grabb.it>
Subject Re: CouchDB as MapReduce framework?
Date Sun, 14 Sep 2008 14:56:24 GMT
On Sun, Sep 14, 2008 at 5:44 AM, Hendrijk Templehov
<nuzux05@googlemail.com> wrote:
> So, to come to the point: The second map/reduce job (actually counting
> the words) is fully done by application logic. After the
> word_count/count-view is executed to CouchDB, CouchDB itself is not
> anymore related to what you're doing there. If you imagine a task
> where more than one (ore more than ten) map/reduce-jobs are involved,
> only the first one is executed via CouchDB itself. This way you lose
> CouchDB's distributed features, because you simply rely on your own
> application.
>

Hendrijk,

You're correct that CouchDB does not currently support chained
map-reduce jobs. This is because the incremental update feature (where
only changes to the database have to be taken into account between
queries to the view) doesn't have a facility to expire view-rows that
are attached to original documents only through another map/reduce
job.

I've had success copying the output of a map/reduce view into another
database, and then running another set of views on it. There has been
some talk about how to do that while preserving the incremental update
features, but I haven't heard of an implementation yet.

As far as my examples go, it is possible to request from CouchDB a
list of all the words in the books, and a count of each word (across
all books or from each individually) through use of the group_level
query parameter. What is *not* supported currently is outputting the
top N words by count. Your application will have to download the
unique list of words with their counts, and sort by count outside of
CouchDB.

Group_level examples are available in the CouchDB unit tests (see reduce):
http://svn.apache.org/repos/asf/incubator/couchdb/trunk/share/www/script/couch_tests.js

There is also some example code from me that uses group_level:
http://jchris.mfdz.com/code/2008/6/markov_chains_using_couchdb_s_g

-- 
Chris Anderson
http://jchris.mfdz.com

Mime
View raw message