Return-Path: X-Original-To: apmail-hadoop-common-user-archive@www.apache.org Delivered-To: apmail-hadoop-common-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id B530F1057D for ; Wed, 18 Sep 2013 13:47:38 +0000 (UTC) Received: (qmail 14426 invoked by uid 500); 18 Sep 2013 13:47:23 -0000 Delivered-To: apmail-hadoop-common-user-archive@hadoop.apache.org Received: (qmail 14121 invoked by uid 500); 18 Sep 2013 13:47:21 -0000 Mailing-List: contact user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hadoop.apache.org Delivered-To: mailing list user@hadoop.apache.org Received: (qmail 14114 invoked by uid 99); 18 Sep 2013 13:47:19 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 18 Sep 2013 13:47:19 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of shahab.yunus@gmail.com designates 209.85.214.52 as permitted sender) Received: from [209.85.214.52] (HELO mail-bk0-f52.google.com) (209.85.214.52) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 18 Sep 2013 13:47:12 +0000 Received: by mail-bk0-f52.google.com with SMTP id e11so2769220bkh.39 for ; Wed, 18 Sep 2013 06:46:52 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; bh=mAVcZcoGVDdMnNrrKTTsKzPFRW5m7nyc4vGfPBOmS4A=; b=gdvJJiWgZGRFeuXn6xGbHnJw9lgsc43V7mGhestvN8jjP2bZ+Ip4JamMzvvpgohCil ud0LmUghkVRY3DxWSv993SHD7knGQ0oh/B32Uv/uat/N39jCPOC354yAZWzz6VQ71Pi1 pNs4iEBFkkru99txRzSAXgXIS0ZP8rpWwWWmLk1TfYY1xs0t4H9Oaye0IqZPGzdO5NQN aN6C7EVtAeZ0FW+Su6nRbwwCpZOVjJXjGmxogmMeXVfpy3EW0Ki0GYDRF9zb4fdgu1eH rtbPm0QlgQfErVdV8J1ufWyjuxqvpR+Pjt9Ea1GAoqBMpriOrfED9uAvlZCtehSgxq7E AXNg== MIME-Version: 1.0 X-Received: by 10.204.71.133 with SMTP id h5mr35325336bkj.0.1379512009087; Wed, 18 Sep 2013 06:46:49 -0700 (PDT) Received: by 10.204.231.76 with HTTP; Wed, 18 Sep 2013 06:46:48 -0700 (PDT) In-Reply-To: References: Date: Wed, 18 Sep 2013 09:46:48 -0400 Message-ID: Subject: Re: MAP_INPUT_RECORDS counter in the reducer From: Shahab Yunus To: "user@hadoop.apache.org" Content-Type: multipart/alternative; boundary=047d7b8747ca441e3704e6a8ab0f X-Virus-Checked: Checked by ClamAV on apache.org --047d7b8747ca441e3704e6a8ab0f Content-Type: text/plain; charset=ISO-8859-1 Yes, you are correct that copying phase starts while the maps are running and the reduce function is not called until everything is done but aren't the Reduce tasks are also already 'initialized' at this point? Which, as far as I know and might be wrong, will not have the map input records counter (and was my point)? Regards, Shahab On Tue, Sep 17, 2013 at 11:09 PM, Rahul Bhattacharjee < rahul.rec.dgp@gmail.com> wrote: > Shahab, > > One question - You mentioned - "In the normal configuration, the issue > here is that Reducers can start before all the Maps have finished so it is > not possible to get the number (or make sense of it even if you are able > to,)" > > I think , reducers would start copying the data form the completed map > tasks , but will not start the actual reduce process until data from all > the mappers are pulled in. > > So , the call to the counter Yorn has made might work.If invoked from the > reduce method. > > Thanks, > Rahul > > > > On Wed, Sep 18, 2013 at 7:38 AM, java8964 java8964 wrote: > >> Or you do the calculation in the reducer close() method, even though I am >> not sure in the reducer you can get the Mapper's count. >> >> But even you can't, here is what can do: >> 1) Save the JobConf reference in your Mapper conf metehod >> 2) Store the Map_INPUT_RECORDS counter in the configuration object as >> your own properties, in the close() method of the mapper >> 3) Retrieve that property in the reducer close() method, then you have >> both numbers at that time. >> >> Yong >> >> ------------------------------ >> Date: Tue, 17 Sep 2013 09:49:06 -0400 >> Subject: Re: MAP_INPUT_RECORDS counter in the reducer >> From: shahab.yunus@gmail.com >> To: user@hadoop.apache.org >> >> >> In the normal configuration, the issue here is that Reducers can start >> before all the Maps have finished so it is not possible to get the number >> (or make sense of it even if you are able to,) >> >> Having said that, you can specifically make sure that Reducers don't >> start until all your maps have completed. It will of course slow down your >> job. I don't know whether with this option it will work or not, but you can >> try (until experts have some advise already.) >> >> Regards, >> Shahab >> >> >> On Tue, Sep 17, 2013 at 6:09 AM, Yaron Gonen wrote: >> >> Hi, >> Is there a way for the reducer to get the total number of input records >> to the map phase? >> For example, I want the reducer to normalize a sum by dividing it in the >> number of records. I tried getting the value of that counter by using the >> line: >> >> context.getCounter(Task.Counter.MAP_INPUT_RECORDS).getValue(); >> >> in the reducer code, but I got 0. >> >> Thanks! >> Yaron >> >> >> > --047d7b8747ca441e3704e6a8ab0f Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable
Yes, you are correct that copying phase starts while the m= aps are running and the reduce function is not called until everything is d= one but aren't the Reduce tasks are also already 'initialized' = at this point? Which, as far as I know and might be wrong, will not have th= e map input records counter (and was my point)?=A0

Regards,
Shahab


On Tue, Sep 17, 2013 at 11:09 PM, Ra= hul Bhattacharjee <rahul.rec.dgp@gmail.com> wrote:
Shahab,

One question - You mentioned - "In the normal configuration, the issue= here is that Reducers can start=20 before all the Maps have finished so it is not possible to get the=20 number (or make sense of it even if you are able to,)"

I think , re= ducers would start copying the data form the completed map tasks , but will= not start the actual reduce process until data from all the mappers are pu= lled in.

= So , the call to the counter Yorn has made might work.If invoked from the r= educe method.

Thanks,
Rahul


<= br>
On Wed, Sep 18, 2013 at 7:38 AM, java8964 jav= a8964 <java8964@hotmail.com> wrote:
Or you do the calculation in the reducer close() meth= od, even though I am not sure in the reducer you can get the Mapper's c= ount.

But even you can't, here is what can do:
1) Save the JobConf reference in your Mapper conf metehod
2)= Store the Map_INPUT_RECORDS counter in the configuration object as your ow= n properties, in the close() method of the mapper
3) Retrieve tha= t property in the reducer close() method, then you have both numbers at tha= t time.

Yong


Date: Tue, 17 Sep 2013 09:49:06 -0= 400
Subject: Re: MAP_INPUT_RECORDS counter in the reducer
From: shahab.yunus@gmail.c= om
To: user@hadoop= .apache.org


In the normal configurati= on, the issue here is that Reducers can start before all the Maps have fini= shed so it is not possible to get the number (or make sense of it even if y= ou are able to,)

Having said that, you can specifically make sure that Reducers don't st= art until all your maps have completed. It will of course slow down your jo= b. I don't know whether with this option it will work or not, but you c= an try (until experts have some advise already.)

Regards,
Shahab



--047d7b8747ca441e3704e6a8ab0f--