hadoop-mapreduce-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ravi Gummadi <gr...@yahoo-inc.com>
Subject Re: reducers run past 100% (does that problem still exist?)
Date Tue, 22 Jun 2010 09:57:46 GMT
Reduce task has 3 phases: copy phase, sort phase and reduce phase. Each 
phase will correspond to 33.33% of the total reduce task's progress. 
Which phase was your reducer in when you saw the progress > 100%(You 
could see the phase on the web UI in the column "state" after the ">" 
symbol) ?

If you see progress > 66.7% while the task is in sort phase, then the 
problem could be in merge progress calculation, which is already fixed 
in HADOOP-5210.  Hadoop version 0.20.2 should already contain the fix of 
Otherwise, if the 3rd phase of reduce task(reduce phase) is started with 
66.7% only and then if progress goes beyond 100%, then may be the bug(in 
hadoop) is because of not calculating progress correctly for the case of 
"compressed input to reducer".


Friso van Vollenhoven wrote:
> Hi all,
> When I run long running map/reduce jobs the reducers run past 100% before reaching completion.
Sometimes as far as up to 140%. I have searched the mailing list and other resources and noticed
bug reports related to this when using map output compression, but all appear to be fixed
by now.
> The job I am running reads sequence files from HDFS and in the reducer inserts records
into HBase. The reducer has NullWritable as both output key and output value.
> Some additional info:
> - the job takes in total close to 60 hours to complete
> - there are 10 reducers
> - the map output is compressed using the default codec and block compression
> - speculative execution is turned off (otherwise we could be hitting HBase harder than
> - mapred.job.reuse.jvm.num.tasks = 1
> - io.sort.factor = 100
> - io.sort.record.percent = 0.3
> - io.sort.spill.percent = 0.9
> - mapred.inmem.merge.threshold = 100
> - mapred.job.reduce.input.buffer.percent = 1.0
> I am using Hadoop 0.20.2 on a small cluster (1x NN+JT, 4x DN+TT).
> Does anyone have a clue? Or can anyone tell me how the progress info for reducers is
calculated? Any help is appreciated.
> Regards,
> Friso

View raw message