hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Miklos Szegedi (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-8037) CGroupsResourceCalculator excessive warnings on container relaunch
Date Fri, 16 Mar 2018 16:22:00 GMT

    [ https://issues.apache.org/jira/browse/YARN-8037?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16402121#comment-16402121
] 

Miklos Szegedi commented on YARN-8037:
--------------------------------------

Thank you, [~shanekumpf@gmail.com] for raising this. [~haibochen], we had a logic in one of
earlier patches in YARN-7064 to remove multiple reporting of issues from CGroupsResourceCalculator
and we removed based on your advice to support debugging. What is your opinion about this
suggestion? Do you think we should add back some filtering here? https://issues.apache.org/jira/browse/YARN-7064?focusedCommentId=16323135&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-16323135

 

> CGroupsResourceCalculator excessive warnings on container relaunch
> ------------------------------------------------------------------
>
>                 Key: YARN-8037
>                 URL: https://issues.apache.org/jira/browse/YARN-8037
>             Project: Hadoop YARN
>          Issue Type: Bug
>            Reporter: Shane Kumpf
>            Priority: Major
>
> When a container is relaunched, the old process no longer exists. When using the {{CGroupsResourceCalculator}}
this results in the warning and exception below being logged every second until the relaunch
occurs, which is excessive and filling up the logs.
> {code:java}
> 2018-03-16 14:30:33,438 WARN org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.resources.CGroupsResourceCalculator:
Failed to parse 12844
> org.apache.hadoop.yarn.exceptions.YarnException: The process vanished in the interim
12844
> at org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.resources.CGroupsResourceCalculator.processFile(CGroupsResourceCalculator.java:336)
> at org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.resources.CGroupsResourceCalculator.readTotalProcessJiffies(CGroupsResourceCalculator.java:252)
> at org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.resources.CGroupsResourceCalculator.updateProcessTree(CGroupsResourceCalculator.java:181)
> at org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.resources.CombinedResourceCalculator.updateProcessTree(CombinedResourceCalculator.java:52)
> at org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl$MonitoringThread.run(ContainersMonitorImpl.java:457)
> Caused by: java.io.FileNotFoundException: /sys/fs/cgroup/cpu,cpuacct/hadoop-yarn/container_e01_1521209613260_0002_01_000002/cpuacct.stat
(No such file or directory)
> at java.io.FileInputStream.open0(Native Method)
> at java.io.FileInputStream.open(FileInputStream.java:195)
> at java.io.FileInputStream.<init>(FileInputStream.java:138)
> at org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.resources.CGroupsResourceCalculator.processFile(CGroupsResourceCalculator.java:320)
> ... 4 more
> 2018-03-16 14:30:33,438 WARN org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.resources.CGroupsResourceCalculator:
Failed to parse cgroups /sys/fs/cgroup/memory/hadoop-yarn/container_e01_1521209613260_0002_01_000002/memory.memsw.usage_in_bytes
> org.apache.hadoop.yarn.exceptions.YarnException: The process vanished in the interim
12844
> at org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.resources.CGroupsResourceCalculator.processFile(CGroupsResourceCalculator.java:336)
> at org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.resources.CGroupsResourceCalculator.getMemorySize(CGroupsResourceCalculator.java:238)
> at org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.resources.CGroupsResourceCalculator.updateProcessTree(CGroupsResourceCalculator.java:187)
> at org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.resources.CombinedResourceCalculator.updateProcessTree(CombinedResourceCalculator.java:52)
> at org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl$MonitoringThread.run(ContainersMonitorImpl.java:457)
> Caused by: java.io.FileNotFoundException: /sys/fs/cgroup/memory/hadoop-yarn/container_e01_1521209613260_0002_01_000002/memory.usage_in_bytes
(No such file or directory)
> at java.io.FileInputStream.open0(Native Method)
> at java.io.FileInputStream.open(FileInputStream.java:195)
> at java.io.FileInputStream.<init>(FileInputStream.java:138)
> at org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.resources.CGroupsResourceCalculator.processFile(CGroupsResourceCalculator.java:320)
> ... 4 more{code}
> We should consider moving the exception to debug to reduce the noise at a minimum. Alternatively,
it may make sense to stop the existing {{MonitoringThread}} during relaunch.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: yarn-issues-help@hadoop.apache.org


Mime
View raw message