incubator-couchdb-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Kropp, Henning" <hkr...@microlution.de>
Subject Re: Multiple map reduce stages
Date Thu, 20 May 2010 08:48:20 GMT
Am 18.05.2010 20:16, schrieb J Chris Anderson:
> On May 18, 2010, at 2:52 AM, Kropp, Henning wrote:
>
>   
>> Hi,
>>
>> as far as I know working with map reduce commonly involves multiple map
>> and reduce stages. A view in couchdb solely consists of one map and if
>> necessary one reduce stage!? To have multiple map and reduce stages one
>> would have to conjunct views in couchdb!? How can I do that? Is it
>> possible to give the function(doc){..} another parameter? There is the
>> shows which have the extra parameter req for the http request.
>> Unfortunately my javascript knowledge of the underlaying Prototype
>> concept is not very funded which could be helpful here?
>>
>> Kind regards and many thanks in advanced
>>     
>
> CouchDB Map Reduce is a realtime incremental model, so it is quite different from the
Hadoop-style batch model. Of course you can still chain map reduce by copying the rows from
a view query to a new db, and writing another view on the new db.
>
> Chris

That is interesting to know. Hive adopts the batch model but obviously
serves a different purpose.

I was asking because of an actual problem I am having, maybe one can
help. The problem I am having is that I would like to group documents by
a value, but only those documents in a certain time interval. In this
scenario couchdb is used for logging, which might not be a purpose
couchdb initially is designed for.

I came up with the following solution. Grouping by value (uri) and time
using the group_level=1 and the start and end key like follow:

/_temp_view?group=true&group_level=1&startkey=[1270826004.0]&endkey=[{},1270826011.0]

and simply counting

function(doc) { emit([doc.URI,doc.Time], 1 );

Now experienced couchdb users might already see, that this results in
all documents being grouped no difference of the time set in the start
and end key. I needed some time to figure out why but finally realized
the problem even so I can not explain it right and maybe I am totally
wrong after all.

So I thought I might help first mapping the documents by the time value
and in a next step mapping and reducing it by the uri value. A different
approach I came up with could be designing a 3 value for each document
consisting of a conjunction of time and uri and working with that as the
key!?

Maybe and hopefully there is even a third approach I am not thinking of.
I really appreciate the help.

Thanks

 

 

Mime
View raw message