hadoop-mapreduce-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Robert Evans <ev...@yahoo-inc.com>
Subject Re: how to access a mapper counter in reducer
Date Fri, 02 Dec 2011 21:04:56 GMT
Anurag,

The current set of counter APIs from within Map or Reduce process are write only.  They are
not intended to be used for reading data from other tasks.  They are there to be used for
collecting statistics about the job as a whole.  If you use too many of them the performance
of the system as a whole can get very bad, because they are stored on the JobTracker in memory.
 Also there is the potential that a map task that has finished "successfully" can later fail
if the node it is running on dies before all of the map output can be fetched by all of the
reducers.  This could result in a reducer reading in counter data that is only partial or
out of date.  You may be able to access it through the job API  but I would not recommend
it and I think there may be some issues with security if you have security enabled, but I
don't know for sure.

If you have an optimization that really needs summary data from each mapper in all reducers
then you should do it a map/reduce way.   Output a special key/value pair when a mapper finishes
for each reducer with the statistics in it.  You can know how many reducers there are because
that is set in the configuration.  You then need a special partitioner to recognize those
summary key/value pairs and make sure that they each go to the proper reducer.  You also need
a special compairitor to make sure that these special keys are the very first ones read by
the reducer so it can have the data before processing anything else.

I would also recommend that you don't try to store this data in HDFS.  You can very easily
do a DDOS on the namenode on a large cluster, and then your ops will yell at you as they did
with me before I stopped doing it.  I have made the above thing work.  It is just a lot of
work to do it right.

--Bobby Evans


On 12/1/11 1:18 PM, "Markus Jelsma" <markus.jelsma@openindex.io> wrote:

Can access it via the Job API?

http://hadoop.apache.org/mapreduce/docs/current/api/org/apache/hadoop/mapreduce/Job.html#getCounters%28%29

> Hi,

> I have a similar query.

>

> Infact, I sent it yesterday and waiting for anybody's response who might

> have done it.

>

>

> Thanks,

> Anurag Tangri

>

> 2011/11/30 rabbit_cheng <rabbit_cheng@126.com>

>

> >  I have created a counter in mapper to count something, I wanna get the

> >

> > counter's value in reducer phase, the code segment is as follow:

> >

> > public class MM extends Mapper<LongWritable, Text, Text, Text> {

> >

> >     static enum TEST{ pt }

> >     @Override

> >     public void map(LongWritable key, Text values, Context context)

> >     throws

> >

> > IOException, InterruptedException {

> >

> >         context.getCounter(TEST.pt).increment(1);

> >

> >     }

> >

> > }

> > public class KMeansReducer extends Reducer<Text, Text, Text, Text> {

> >

> >     @Override

> >     protected void setup(Context context) throws IOException,

> >

> > InterruptedException {

> >

> >         long ptValue=context.getCounter(MM.TEST.pt <http://mm.test.pt/>

> >

> > ).getValue();

> >

> >     }

> >

> > }

> > but what I get is always 0, i.e., the value of variable ptValue is always

> > 0.

> > Does anybody know how to access a mapper counter in reducer?


Mime
View raw message