Return-Path: Delivered-To: apmail-couchdb-user-archive@www.apache.org Received: (qmail 32419 invoked from network); 20 May 2010 08:48:53 -0000 Received: from unknown (HELO mail.apache.org) (140.211.11.3) by 140.211.11.9 with SMTP; 20 May 2010 08:48:53 -0000 Received: (qmail 38991 invoked by uid 500); 20 May 2010 08:48:52 -0000 Delivered-To: apmail-couchdb-user-archive@couchdb.apache.org Received: (qmail 38862 invoked by uid 500); 20 May 2010 08:48:52 -0000 Mailing-List: contact user-help@couchdb.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@couchdb.apache.org Delivered-To: mailing list user@couchdb.apache.org Received: (qmail 38854 invoked by uid 99); 20 May 2010 08:48:51 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 20 May 2010 08:48:51 +0000 X-ASF-Spam-Status: No, hits=-1.6 required=10.0 tests=AWL,RCVD_IN_DNSWL_MED,SPF_NEUTRAL X-Spam-Check-By: apache.org Received-SPF: neutral (athena.apache.org: local policy) Received: from [139.18.1.28] (HELO v3.rz.uni-leipzig.de) (139.18.1.28) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 20 May 2010 08:48:43 +0000 Received: from localhost (localhost [127.0.0.1]) by v3.rz.uni-leipzig.de (Postfix) with ESMTP id 3A0912C06E for ; Thu, 20 May 2010 10:48:19 +0200 (CEST) X-Virus-Scanned: by amavisd-new at v3-ul Received: from v3.rz.uni-leipzig.de ([127.0.0.1]) by localhost (v3.rz.uni-leipzig.de [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id IYBMBwDCOX-8 for ; Thu, 20 May 2010 10:48:19 +0200 (CEST) Received: from studserv.uni-leipzig.de (studserv.uni-leipzig.de [139.18.1.15]) by v3.rz.uni-leipzig.de (Postfix) with ESMTP id 1F7CD2C06D for ; Thu, 20 May 2010 10:48:18 +0200 (CEST) Received: from [192.168.2.121] (p4FDE7552.dip.t-dialin.net [79.222.117.82]) by studserv.uni-leipzig.de (Postfix) with ESMTPSA id C5D7C485F for ; Thu, 20 May 2010 10:48:18 +0200 (CEST) Message-ID: <4BF4F754.7080806@microlution.de> Date: Thu, 20 May 2010 10:48:20 +0200 From: "Kropp, Henning" User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; de; rv:1.9.1.9) Gecko/20100317 Lightning/1.0b1 Thunderbird/3.0.4 MIME-Version: 1.0 To: user@couchdb.apache.org Subject: Re: Multiple map reduce stages References: <4BF26365.7030201@microlution.de> In-Reply-To: X-Enigmail-Version: 1.0.1 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Am 18.05.2010 20:16, schrieb J Chris Anderson: > On May 18, 2010, at 2:52 AM, Kropp, Henning wrote: > > >> Hi, >> >> as far as I know working with map reduce commonly involves multiple map >> and reduce stages. A view in couchdb solely consists of one map and if >> necessary one reduce stage!? To have multiple map and reduce stages one >> would have to conjunct views in couchdb!? How can I do that? Is it >> possible to give the function(doc){..} another parameter? There is the >> shows which have the extra parameter req for the http request. >> Unfortunately my javascript knowledge of the underlaying Prototype >> concept is not very funded which could be helpful here? >> >> Kind regards and many thanks in advanced >> > > CouchDB Map Reduce is a realtime incremental model, so it is quite different from the Hadoop-style batch model. Of course you can still chain map reduce by copying the rows from a view query to a new db, and writing another view on the new db. > > Chris That is interesting to know. Hive adopts the batch model but obviously serves a different purpose. I was asking because of an actual problem I am having, maybe one can help. The problem I am having is that I would like to group documents by a value, but only those documents in a certain time interval. In this scenario couchdb is used for logging, which might not be a purpose couchdb initially is designed for. I came up with the following solution. Grouping by value (uri) and time using the group_level=1 and the start and end key like follow: /_temp_view?group=true&group_level=1&startkey=[1270826004.0]&endkey=[{},1270826011.0] and simply counting function(doc) { emit([doc.URI,doc.Time], 1 ); Now experienced couchdb users might already see, that this results in all documents being grouped no difference of the time set in the start and end key. I needed some time to figure out why but finally realized the problem even so I can not explain it right and maybe I am totally wrong after all. So I thought I might help first mapping the documents by the time value and in a next step mapping and reducing it by the uri value. A different approach I came up with could be designing a 3 value for each document consisting of a conjunction of time and uri and working with that as the key!? Maybe and hopefully there is even a third approach I am not thinking of. I really appreciate the help. Thanks