hadoop-mapreduce-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jason Lowe (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (MAPREDUCE-4982) AM hung with one pending map task
Date Fri, 08 Feb 2013 18:55:12 GMT

    [ https://issues.apache.org/jira/browse/MAPREDUCE-4982?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13574723#comment-13574723

Jason Lowe commented on MAPREDUCE-4982:

bq. getContainerReqToReplace seems to be removing and re-adding entries into 'maps'.

Aha, that's the missing piece.  Great catch!  There was some blacklisting going on with lots
of containers being allocated on blacklisted nodes.  getContainerReqToReplace probably moved
the initial attempt of one map task after the failed map attempt of another.

bq. Alternately, could speculation cause this - was it enabled for this job?

I think speculation could also be an opportunity for fast fail map attempts to get ahead of
normal map attempts, but it was not enabled in this particular case.
> AM hung with one pending map task
> ---------------------------------
>                 Key: MAPREDUCE-4982
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4982
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: mr-am
>    Affects Versions: 0.23.6
>            Reporter: Jason Lowe
> Saw a job that hung with one pending map task that never ran.  The task was in the SCHEDULED
state with a single attempt that was in the UNASSIGNED state.  The AM looked like it was waiting
for a container from the RM, but the RM was never granting it the one container it needed.
> I suspect the AM botched the container request bookkeeping somehow.  More details to

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

View raw message