Mailing-List: contact user-help@couchdb.apache.org; run by ezmlm
Precedence: bulk
Reply-To: user@couchdb.apache.org
Received-SPF: neutral (athena.apache.org: local policy)
Message-ID: <4BF4F754.7080806@microlution.de>
Date: Thu, 20 May 2010 10:48:20 +0200
From: "Kropp, Henning" <hkropp@microlution.de>
User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; de;
 rv:1.9.1.9) Gecko/20100317 Lightning/1.0b1 Thunderbird/3.0.4
MIME-Version: 1.0
To: user@couchdb.apache.org
Subject: Re: Multiple map reduce stages
References: <4BF26365.7030201@microlution.de>
 <AEED853F-D173-4D0D-AEF2-B42632A7B49E@gmail.com>
In-Reply-To: <AEED853F-D173-4D0D-AEF2-B42632A7B49E@gmail.com>
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: 7bit

Am 18.05.2010 20:16, schrieb J Chris Anderson:
> On May 18, 2010, at 2:52 AM, Kropp, Henning wrote:
>
>   
>> Hi,
>>
>> as far as I know working with map reduce commonly involves multiple map
>> and reduce stages. A view in couchdb solely consists of one map and if
>> necessary one reduce stage!? To have multiple map and reduce stages one
>> would have to conjunct views in couchdb!? How can I do that? Is it
>> possible to give the function(doc){..} another parameter? There is the
>> shows which have the extra parameter req for the http request.
>> Unfortunately my javascript knowledge of the underlaying Prototype
>> concept is not very funded which could be helpful here?
>>
>> Kind regards and many thanks in advanced
>>     
>
> CouchDB Map Reduce is a realtime incremental model, so it is quite different from the Hadoop-style batch model. Of course you can still chain map reduce by copying the rows from a view query to a new db, and writing another view on the new db.
>
> Chris

That is interesting to know. Hive adopts the batch model but obviously
serves a different purpose.

I was asking because of an actual problem I am having, maybe one can
help. The problem I am having is that I would like to group documents by
a value, but only those documents in a certain time interval. In this
scenario couchdb is used for logging, which might not be a purpose
couchdb initially is designed for.

I came up with the following solution. Grouping by value (uri) and time
using the group_level=1 and the start and end key like follow:

/_temp_view?group=true&group_level=1&startkey=[1270826004.0]&endkey=[{},1270826011.0]

and simply counting

function(doc) { emit([doc.URI,doc.Time], 1 );

Now experienced couchdb users might already see, that this results in
all documents being grouped no difference of the time set in the start
and end key. I needed some time to figure out why but finally realized
the problem even so I can not explain it right and maybe I am totally
wrong after all.

So I thought I might help first mapping the documents by the time value
and in a next step mapping and reducing it by the uri value. A different
approach I came up with could be designing a 3 value for each document
consisting of a conjunction of time and uri and working with that as the
key!?

Maybe and hopefully there is even a third approach I am not thinking of.
I really appreciate the help.

Thanks