hadoop-mapreduce-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Christopher Egner <ceg...@apple.com>
Subject Re: Combiner timing out
Date Sat, 05 Nov 2011 00:13:35 GMT
I'm using CDH3u0 and streaming, so this is hadoop-0.20.2 at patch level 923.21 (cf https://ccp.cloudera.com/display/DOC/Downloading+CDH+Releases).

I modified the streaming code to confirm that it is calling progress when I ask it to and
which Reporter class is actually being used.  It's the Task.TaskReporter class for map and
reduce but the Reporter.NULL class for combine (both map-side and reduce-side combines). 
It appears to be the mapred layer (as opposed to streaming) that sets the reporter, so this
should affect non-streaming jobs as well.


On Nov 4, 2011, at 9:11 AM, Robert Evans wrote:

> There was a change that went into 0.20.205 https://issues.apache.org/jira/browse/MAPREDUCE-2187
where after so many inputs to the combiner progress is automatically reported.  I looked through
the code for 0.20.205 and from what I can see the CombineOutputCollector should be getting
an instance of TaskReporter.  What version of Hadoop are you running?  Are you using the old
APIs in the mapred package or the newer APIs in the mapreduce java package?
> --Bobby Evans
> On 11/4/11 1:20 AM, "Christopher Egner" <cegner@apple.com> wrote:
> Hi all,
> Let me preface this with my understanding of how tasks work.
> If a task takes a long time (default 10min) and demonstrates no progress, the task tracker
will decide the process is hung, kill it, and start a new attempt.  Normally, one uses a Reporter
instance's progress method to provide progress updates and avoid this. For a streaming mapper,
the Reporter class is org.apache.hadoop.mapred.Task$TaskReporter and this works well.  Streaming
is even set up to take progress, status, and counter updates from stderr, which is really
> However, for combiner tasks, the class is org.apache.hadoop.mapred.Reporter$1.  The first
subclass in this particular java file is the Reporter.NULL class, which ignores all updates.
 So even if a combiner task is updating its reporter in accordance with docs (see postscript),
its updates are ignored and it dies at 10 minutes.  Or one sets mapred.task.timeout very high,
allowing truly hung tasks to go unrecognised for much longer.
> At least this is what I've been able to put together from reading code and searching
the web for docs (except hadoop jira which has been down for a while - my bad luck).
> So am I understanding this correctly?  Are there plans to change this?  Or reasons that
combiners can't have normal reporters associated to them?
> Thanks for any help,
> Chris
> http://hadoop.apache.org/common/docs/r0.20.2/mapred_tutorial.html#Reporter
> http://www.cloudera.com/blog/2009/05/10-mapreduce-tips/ (cf tip 7)
> http://hadoop.apache.org/common/docs/r0.18.3/streaming.html#How+do+I+update+counters+in+streaming+applications%3F
> http://hadoop.apache.org/common/docs/r0.20.0/mapred-default.html  (cf mapred.task.timeout)

View raw message