hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Hudson (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-8035) Uncaught exception in ContainersMonitorImpl during relaunch due to the process ID changing
Date Tue, 03 Apr 2018 18:54:00 GMT

    [ https://issues.apache.org/jira/browse/YARN-8035?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16424441#comment-16424441
] 

Hudson commented on YARN-8035:
------------------------------

SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #13920 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/13920/])
YARN-8035. Uncaught exception in ContainersMonitorImpl during relaunch (szegedim: rev 2d06d885c84b2e4a3acb6d3e0c50d4870e37ca82)
* (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/monitor/ContainerMetrics.java
* (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/monitor/TestContainerMetrics.java


> Uncaught exception in ContainersMonitorImpl during relaunch due to the process ID changing
> ------------------------------------------------------------------------------------------
>
>                 Key: YARN-8035
>                 URL: https://issues.apache.org/jira/browse/YARN-8035
>             Project: Hadoop YARN
>          Issue Type: Bug
>            Reporter: Shane Kumpf
>            Assignee: Shane Kumpf
>            Priority: Major
>         Attachments: YARN-8035.001.patch, YARN-8035.002.patch
>
>
> In the case of a container relaunch event, the container ID is reused but a new process
is spawned. For resource monitoring, {{ContainersMonitorImpl}} will obtain the new PID post
relaunch and initialize the process tree monitoring. As part of this initialization, a tag
called {{ContainerPid}}, whose value is the PID for the container, is populated for the
metrics associated with the container. If the prior container failed after its process started,
the original PID will already be populated for the container, resulting in the {{MetricsException}}
below.
> {code:java}
> 2018-03-16 11:59:02,563 WARN org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl:
Uncaught exception in ContainersMonitorImpl while monitoring resource of container_1521201379995_0001_01_000002
> org.apache.hadoop.metrics2.MetricsException: Tag ContainerPid already exists!
> at org.apache.hadoop.metrics2.lib.MetricsRegistry.checkTagName(MetricsRegistry.java:433)
> at org.apache.hadoop.metrics2.lib.MetricsRegistry.tag(MetricsRegistry.java:394)
> at org.apache.hadoop.metrics2.lib.MetricsRegistry.tag(MetricsRegistry.java:400)
> at org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainerMetrics.recordProcessId(ContainerMetrics.java:277)
> at org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl$MonitoringThread.initializeProcessTrees(ContainersMonitorImpl.java:559)
> at org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl$MonitoringThread.run(ContainersMonitorImpl.java:448){code}
> {{MetricsRegistry}} provides a {{tag}} method that allows for updating the value of an
existing tag. Updating the value ensures that the PID associated with container is the currently
running process, which appears to be an appropriate fix. However, it's unclear how this tag
might be being used by other systems. I'm not finding any usage in Hadoop itself.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: yarn-issues-help@hadoop.apache.org


Mime
View raw message