hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Shane Kumpf (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-8037) CGroupsResourceCalculator logs excessive warnings on container relaunch
Date Sat, 31 Mar 2018 12:15:00 GMT

    [ https://issues.apache.org/jira/browse/YARN-8037?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16421294#comment-16421294
] 

Shane Kumpf commented on YARN-8037:
-----------------------------------

Reopening after more testing. This is not limited to relaunch, I'm finding that I get these
exceptions for every single application I run (MR pi, MR sleep, dshell, native services),
even when the container completes successfully. I'd like to explore ways to eliminate this.

{code}
2018-03-31 12:09:48,082 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch:
Container container_1522497044327_0004_01_000001 succeeded
2018-03-31 12:09:48,082 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerImpl:
Container container_1522497044327_0004_01_000001 transitioned from RUNNING to EXITED_WITH_SUCCESS
2018-03-31 12:09:48,082 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch:
Cleaning up container container_1522497044327_0004_01_000001
2018-03-31 12:09:50,185 WARN org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.resources.CGroupsResourceCalculator:
Failed to parse 13187
org.apache.hadoop.yarn.exceptions.YarnException: The process vanished in the interim 13187
	at org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.resources.CGroupsResourceCalculator.processFile(CGroupsResourceCalculator.java:336)
	at org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.resources.CGroupsResourceCalculator.readTotalProcessJiffies(CGroupsResourceCalculator.java:252)
	at org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.resources.CGroupsResourceCalculator.updateProcessTree(CGroupsResourceCalculator.java:181)
	at org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.resources.CombinedResourceCalculator.updateProcessTree(CombinedResourceCalculator.java:52)
	at org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl$MonitoringThread.run(ContainersMonitorImpl.java:457)
Caused by: java.io.FileNotFoundException: /sys/fs/cgroup/cpu,cpuacct/hadoop-yarn/container_1522497044327_0004_01_000001/cpuacct.stat
(No such file or directory)
	at java.io.FileInputStream.open0(Native Method)
	at java.io.FileInputStream.open(FileInputStream.java:195)
	at java.io.FileInputStream.<init>(FileInputStream.java:138)
	at org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.resources.CGroupsResourceCalculator.processFile(CGroupsResourceCalculator.java:320)
	... 4 more
2018-03-31 12:09:50,186 WARN org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.resources.CGroupsResourceCalculator:
Failed to parse cgroups /sys/fs/cgroup/memory/hadoop-yarn/container_1522497044327_0004_01_000001/memory.memsw.usage_in_bytes
org.apache.hadoop.yarn.exceptions.YarnException: The process vanished in the interim 13187
	at org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.resources.CGroupsResourceCalculator.processFile(CGroupsResourceCalculator.java:336)
	at org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.resources.CGroupsResourceCalculator.getMemorySize(CGroupsResourceCalculator.java:238)
	at org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.resources.CGroupsResourceCalculator.updateProcessTree(CGroupsResourceCalculator.java:187)
	at org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.resources.CombinedResourceCalculator.updateProcessTree(CombinedResourceCalculator.java:52)
	at org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl$MonitoringThread.run(ContainersMonitorImpl.java:457)
Caused by: java.io.FileNotFoundException: /sys/fs/cgroup/memory/hadoop-yarn/container_1522497044327_0004_01_000001/memory.usage_in_bytes
(No such file or directory)
	at java.io.FileInputStream.open0(Native Method)
	at java.io.FileInputStream.open(FileInputStream.java:195)
	at java.io.FileInputStream.<init>(FileInputStream.java:138)
	at org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.resources.CGroupsResourceCalculator.processFile(CGroupsResourceCalculator.java:320)
	... 4 more
2018-03-31 12:09:50,186 DEBUG org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl.audit:
Resource usage of ProcessTree 13187 for container-id container_1522497044327_0004_01_000001:
-1B of 1.5 GB physical memory used; -1B of 3.1 GB virtual memory used CPU:0.000000 CPU/core:0.000000
2018-03-31 12:09:50,220 INFO org.apache.hadoop.yarn.server.nodemanager.NMAuditLogger: USER=hadoopuser
OPERATION=Container Finished - Succeeded	TARGET=ContainerImpl	RESULT=SUCCESS	APPID=application_1522497044327_0004
CONTAINERID=container_1522497044327_0004_01_000001
2018-03-31 12:09:50,220 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerImpl:
Container container_1522497044327_0004_01_000001 transitioned from EXITED_WITH_SUCCESS to
DONE
{code}

> CGroupsResourceCalculator logs excessive warnings on container relaunch
> -----------------------------------------------------------------------
>
>                 Key: YARN-8037
>                 URL: https://issues.apache.org/jira/browse/YARN-8037
>             Project: Hadoop YARN
>          Issue Type: Bug
>            Reporter: Shane Kumpf
>            Priority: Major
>
> When a container is relaunched, the old process no longer exists. When using the {{CGroupsResourceCalculator}}
this results in the warning and exception below being logged every second until the relaunch
occurs, which is excessive and filling up the logs.
> {code:java}
> 2018-03-16 14:30:33,438 WARN org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.resources.CGroupsResourceCalculator:
Failed to parse 12844
> org.apache.hadoop.yarn.exceptions.YarnException: The process vanished in the interim
12844
> at org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.resources.CGroupsResourceCalculator.processFile(CGroupsResourceCalculator.java:336)
> at org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.resources.CGroupsResourceCalculator.readTotalProcessJiffies(CGroupsResourceCalculator.java:252)
> at org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.resources.CGroupsResourceCalculator.updateProcessTree(CGroupsResourceCalculator.java:181)
> at org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.resources.CombinedResourceCalculator.updateProcessTree(CombinedResourceCalculator.java:52)
> at org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl$MonitoringThread.run(ContainersMonitorImpl.java:457)
> Caused by: java.io.FileNotFoundException: /sys/fs/cgroup/cpu,cpuacct/hadoop-yarn/container_e01_1521209613260_0002_01_000002/cpuacct.stat
(No such file or directory)
> at java.io.FileInputStream.open0(Native Method)
> at java.io.FileInputStream.open(FileInputStream.java:195)
> at java.io.FileInputStream.<init>(FileInputStream.java:138)
> at org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.resources.CGroupsResourceCalculator.processFile(CGroupsResourceCalculator.java:320)
> ... 4 more
> 2018-03-16 14:30:33,438 WARN org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.resources.CGroupsResourceCalculator:
Failed to parse cgroups /sys/fs/cgroup/memory/hadoop-yarn/container_e01_1521209613260_0002_01_000002/memory.memsw.usage_in_bytes
> org.apache.hadoop.yarn.exceptions.YarnException: The process vanished in the interim
12844
> at org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.resources.CGroupsResourceCalculator.processFile(CGroupsResourceCalculator.java:336)
> at org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.resources.CGroupsResourceCalculator.getMemorySize(CGroupsResourceCalculator.java:238)
> at org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.resources.CGroupsResourceCalculator.updateProcessTree(CGroupsResourceCalculator.java:187)
> at org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.resources.CombinedResourceCalculator.updateProcessTree(CombinedResourceCalculator.java:52)
> at org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl$MonitoringThread.run(ContainersMonitorImpl.java:457)
> Caused by: java.io.FileNotFoundException: /sys/fs/cgroup/memory/hadoop-yarn/container_e01_1521209613260_0002_01_000002/memory.usage_in_bytes
(No such file or directory)
> at java.io.FileInputStream.open0(Native Method)
> at java.io.FileInputStream.open(FileInputStream.java:195)
> at java.io.FileInputStream.<init>(FileInputStream.java:138)
> at org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.resources.CGroupsResourceCalculator.processFile(CGroupsResourceCalculator.java:320)
> ... 4 more{code}
> We should consider moving the exception to debug to reduce the noise at a minimum. Alternatively,
it may make sense to stop the existing {{MonitoringThread}} during relaunch.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: yarn-issues-help@hadoop.apache.org


Mime
View raw message