hadoop-yarn-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Tao Yang (JIRA)" <j...@apache.org>
Subject [jira] [Created] (YARN-7751) Decommissioned NM leaves orphaned containers
Date Mon, 15 Jan 2018 10:01:01 GMT
Tao Yang created YARN-7751:
------------------------------

             Summary: Decommissioned NM leaves orphaned containers
                 Key: YARN-7751
                 URL: https://issues.apache.org/jira/browse/YARN-7751
             Project: Hadoop YARN
          Issue Type: Bug
            Reporter: Tao Yang


Recently we found some orphaned containers running on a decommissioned NM in our production
cluster. The beginning of this problem is PCIE error of this node, one of local directories
is not writable so that containers whose pid files located on it can't be cleanup successfully,
after a few moments, NM changed to DECOMMISSIONED state and exited.

Corresponding logs in NM:

{noformat}

2018-01-12 21:31:38,495 WARN [DiskHealthMonitor-Timer] org.apache.hadoop.yarn.server.nodemanager.DirectoryCollection:
Directory /dump/2/nm-logs error, Directory is not writable: /dump/2/nm-logs, removing from
list of valid directories

2018-01-12 21:41:23,352 INFO [AsyncDispatcher event handler] org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch:
Cleaning up container container_e37_1508697357114_216838_01_001812
2018-01-12 21:41:25,601 INFO [AsyncDispatcher event handler] org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch:
Could not get pid for container_e37_1508697357114_216838_01_001812. Waited for 2000 ms.

{noformat}

 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-dev-unsubscribe@hadoop.apache.org
For additional commands, e-mail: yarn-dev-help@hadoop.apache.org


Mime
View raw message