hadoop-mapreduce-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Robert Parker (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (MAPREDUCE-4833) Task can get stuck in FAIL_CONTAINER_CLEANUP
Date Tue, 18 Dec 2012 20:40:15 GMT

    [ https://issues.apache.org/jira/browse/MAPREDUCE-4833?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13535288#comment-13535288

Robert Parker commented on MAPREDUCE-4833:

Previously the Container did not send an event on kill if it was DONE, and returned (essentially
a no-op). This patch will send a TA_CONTAINER_CLEANED event in all cases.
> Task can get stuck in FAIL_CONTAINER_CLEANUP
> --------------------------------------------
>                 Key: MAPREDUCE-4833
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4833
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: applicationmaster, mrv2
>    Affects Versions: 0.23.5
>            Reporter: Robert Joseph Evans
>            Assignee: Robert Parker
>            Priority: Critical
>         Attachments: MAPREDUCE4833-23.patch
> If an NM goes down and the AM still tries to launch a container on it the ContainerLauncherImpl
can get stuck in an RPC timeout.  At the same time the RM may notice that the NM has gone
away and inform the AM of this, this triggers a TA_FAILMSG.  If the TA_FAILMSG arrives at
the TaskAttemptImpl before the TA_CONTAINER_LAUNCH_FAILED message then the task attempt will
try to kill the container, but the ContainerLauncherImpl will not send back a TA_CONTAINER_CLEANED
event causing the attempt to be stuck.

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

View raw message