Mailing-List: contact yarn-issues-help@hadoop.apache.org; run by ezmlm
Precedence: bulk
Reply-To: yarn-issues@hadoop.apache.org
Date: Wed, 16 Mar 2016 18:37:33 +0000 (UTC)
From: "Siqi Li (JIRA)" <jira@apache.org>
To: yarn-issues@hadoop.apache.org
Message-ID: <JIRA.12950932.1458151665000.39232.1458153453540@Atlassian.JIRA>
In-Reply-To: <JIRA.12950932.1458151665000@Atlassian.JIRA>
References: <JIRA.12950932.1458151665000@Atlassian.JIRA>
 <JIRA.12950932.1458151665861@arcas>
Subject: [jira] [Commented] (YARN-4831) Recovered containers will be killed
 after NM stateful restart
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 7bit


    [ https://issues.apache.org/jira/browse/YARN-4831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15197895#comment-15197895 ] 

Siqi Li commented on YARN-4831:
-------------------------------

When NM does a stateful restart, the ContainerManagerImpl will try to recover applications, and containers, and then send out ApplicationFinishEvent to apps that in appsState.getFinishedApplications().

The ApplicationFinishEvent could result in newly recovered containers to transit from NEW to DONE with a KillOnNewTransition.
We could add an additional check in KillOnNewTransition to avoid killing completed containers.

> Recovered containers will be killed after NM stateful restart 
> --------------------------------------------------------------
>
>                 Key: YARN-4831
>                 URL: https://issues.apache.org/jira/browse/YARN-4831
>             Project: Hadoop YARN
>          Issue Type: Bug
>            Reporter: Siqi Li
>
> {code}
> 2016-03-04 19:43:48,130 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.container.Container: Container container_1456335621285_0040_01_000066 transitioned from NEW to DONE
> 2016-03-04 19:43:48,130 INFO org.apache.hadoop.yarn.server.nodemanager.NMAuditLogger: USER=henkins-service	OPERATION=Container Finished - Killed	TARGET=ContainerImpl	RESULT=SUCCESS	APPID=application_1456335621285_0040
> {code}


--
This message was sent by Atlassian JIRA
(v6.3.4#6332)