hadoop-mapreduce-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jason Lowe (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (MAPREDUCE-4833) Task can get stuck in FAIL_CONTAINER_CLEANUP
Date Fri, 21 Dec 2012 22:19:13 GMT

    [ https://issues.apache.org/jira/browse/MAPREDUCE-4833?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13538472#comment-13538472

Jason Lowe commented on MAPREDUCE-4833:

+1, thanks for writing a test.
> Task can get stuck in FAIL_CONTAINER_CLEANUP
> --------------------------------------------
>                 Key: MAPREDUCE-4833
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4833
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: applicationmaster, mrv2
>    Affects Versions: 0.23.5
>            Reporter: Robert Joseph Evans
>            Assignee: Robert Parker
>            Priority: Critical
>         Attachments: MAPREDUCE4833-1.patch, MAPREDUCE4833-2.patch, MAPREDUCE4833.patch
> If an NM goes down and the AM still tries to launch a container on it the ContainerLauncherImpl
can get stuck in an RPC timeout.  At the same time the RM may notice that the NM has gone
away and inform the AM of this, this triggers a TA_FAILMSG.  If the TA_FAILMSG arrives at
the TaskAttemptImpl before the TA_CONTAINER_LAUNCH_FAILED message then the task attempt will
try to kill the container, but the ContainerLauncherImpl will not send back a TA_CONTAINER_CLEANED
event causing the attempt to be stuck.

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

View raw message