hadoop-mapreduce-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Friso van Vollenhoven <fvanvollenho...@xebia.com>
Subject reducers run past 100% (does that problem still exist?)
Date Mon, 21 Jun 2010 15:44:49 GMT
Hi all,

When I run long running map/reduce jobs the reducers run past 100% before reaching completion.
Sometimes as far as up to 140%. I have searched the mailing list and other resources and noticed
bug reports related to this when using map output compression, but all appear to be fixed
by now.

The job I am running reads sequence files from HDFS and in the reducer inserts records into
HBase. The reducer has NullWritable as both output key and output value.
Some additional info:
- the job takes in total close to 60 hours to complete
- there are 10 reducers
- the map output is compressed using the default codec and block compression
- speculative execution is turned off (otherwise we could be hitting HBase harder than necessary)
- mapred.job.reuse.jvm.num.tasks = 1
- io.sort.factor = 100
- io.sort.record.percent = 0.3
- io.sort.spill.percent = 0.9
- mapred.inmem.merge.threshold = 100
- mapred.job.reduce.input.buffer.percent = 1.0

I am using Hadoop 0.20.2 on a small cluster (1x NN+JT, 4x DN+TT).

Does anyone have a clue? Or can anyone tell me how the progress info for reducers is calculated?
Any help is appreciated.


View raw message