hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jason Lowe (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-4051) ContainerKillEvent is lost when container is In New State and is recovering
Date Mon, 09 Nov 2015 15:18:11 GMT

    [ https://issues.apache.org/jira/browse/YARN-4051?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14996687#comment-14996687

Jason Lowe commented on YARN-4051:

If I understand this correctly, we're saying that the problem described in YARN-4050 is holding
up the main event dispatcher and the NM is semi-hung, yet we want to hurry and register with
the ResourceManager before containers have recovered?  Seems to me we need to address the
problem described in YARN-4050 if possible (e.g.: skip HDFS operations if we recovered at
least one container in the running or completed states since we know it must have done HDFS
init in the previous NM instance).  Otherwise we are hacking around the fact that we registered
too soon and aren't able to properly handle the out-of-order events.  I'd much rather deal
with the root cause if possible than patch all the separate symptoms.

> ContainerKillEvent is lost when container is  In New State and is recovering
> ----------------------------------------------------------------------------
>                 Key: YARN-4051
>                 URL: https://issues.apache.org/jira/browse/YARN-4051
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: nodemanager
>            Reporter: sandflee
>            Assignee: sandflee
>            Priority: Critical
>         Attachments: YARN-4051.01.patch, YARN-4051.02.patch, YARN-4051.03.patch
> As in YARN-4050, NM event dispatcher is blocked, and container is in New state, when
we finish application, the container still alive even after NM event dispatcher is unblocked.

This message was sent by Atlassian JIRA

View raw message