hadoop-hdfs-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From java8964 java8964 <java8...@hotmail.com>
Subject RE: MAP_INPUT_RECORDS counter in the reducer
Date Wed, 18 Sep 2013 02:08:14 GMT
Or you do the calculation in the reducer close() method, even though I am not sure in the reducer
you can get the Mapper's count.
But even you can't, here is what can do:1) Save the JobConf reference in your Mapper conf
metehod2) Store the Map_INPUT_RECORDS counter in the configuration object as your own properties,
in the close() method of the mapper3) Retrieve that property in the reducer close() method,
then you have both numbers at that time.

Date: Tue, 17 Sep 2013 09:49:06 -0400
Subject: Re: MAP_INPUT_RECORDS counter in the reducer
From: shahab.yunus@gmail.com
To: user@hadoop.apache.org

In the normal configuration, the issue here is that Reducers can start before all the Maps
have finished so it is not possible to get the number (or make sense of it even if you are
able to,)

Having said that, you can specifically make sure that Reducers don't start until all your
maps have completed. It will of course slow down your job. I don't know whether with this
option it will work or not, but you can try (until experts have some advise already.)


On Tue, Sep 17, 2013 at 6:09 AM, Yaron Gonen <yaron.gonen@gmail.com> wrote:

Hi,Is there a way for the reducer to get the total number of input records to the map phase?
For example, I want the reducer to normalize a sum by dividing it in the number of records.
I tried getting the value of that counter by using the line:


in the reducer code, but I got 0.


View raw message