hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Shurong Mai (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (YARN-9518) can not use CGroups with YARN in centos7
Date Mon, 29 Apr 2019 11:58:00 GMT

     [ https://issues.apache.org/jira/browse/YARN-9518?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Shurong Mai updated YARN-9518:
------------------------------
    Description: 
When I had set configuration variables  for cgroup with yarn, nodemanager could be start
without any matter. But when I ran a job, the job failed with these exceptional nodemanager
logs in the end.

In these logs, the important logs is " Can't open file /sys/fs/cgroup/cpu as node manager
- Is a directory "

After I analysed, I found the reason. In centos6, the cgroup "cpu" and "cpuacct" subsystem
are as follows: 
{code:java}
/sys/fs/cgroup/cpu
/sys/fs/cgroup/cpuacct
{code}
But in centos7, as follows:
{code:java}
/sys/fs/cgroup/cpu -> cpu,cpuacct
/sys/fs/cgroup/cpuacct -> cpu,cpuacct
/sys/fs/cgroup/cpu,cpuacct{code}
"cpu" and "cpuacct" have merge as "cpu,cpuacct"

 
{panel:title=exceptional nodemanager logs}
2019-04-19 20:17:20,095 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerImpl:
Container container_1554210318404_0042_01_000001 transitioned from LOCALIZED to RUNNING
2019-04-19 20:17:20,101 WARN org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor:
Exit code from container container_1554210318404_0042_01_000001 is : 27
2019-04-19 20:17:20,103 WARN org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor:
Exception from container-launch with container ID: container_155421031840
4_0042_01_000001 and exit code: 27
ExitCodeException exitCode=27:
        at org.apache.hadoop.util.Shell.runCommand(Shell.java:585)
        at org.apache.hadoop.util.Shell.run(Shell.java:482)
        at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:776)
        at org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.launchContainer(LinuxContainerExecutor.java:299)
        at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:302)
        at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:82)
        at java.util.concurrent.FutureTask.run(FutureTask.java:266)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
        at java.lang.Thread.run(Thread.java:745)
2019-04-19 20:17:20,108 INFO org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor:
Exception from container-launch.
2019-04-19 20:17:20,108 INFO org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor:
Container id: container_1554210318404_0042_01_000001
2019-04-19 20:17:20,108 INFO org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor:
Exit code: 27
2019-04-19 20:17:20,108 INFO org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor:
Stack trace: ExitCodeException exitCode=27:
2019-04-19 20:17:20,108 INFO org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor:       at
org.apache.hadoop.util.Shell.runCommand(Shell.java:585)
2019-04-19 20:17:20,108 INFO org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor:       at
org.apache.hadoop.util.Shell.run(Shell.java:482)
2019-04-19 20:17:20,109 INFO org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor:       at
org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:776)
2019-04-19 20:17:20,109 INFO org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor:       at
org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.launchContainer(LinuxContainerExecutor.java:299)
2019-04-19 20:17:20,109 INFO org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor:       at
org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:302)
2019-04-19 20:17:20,109 INFO org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor:       at
org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:82)
2019-04-19 20:17:20,109 INFO org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor:       at
java.util.concurrent.FutureTask.run(FutureTask.java:266)
2019-04-19 20:17:20,109 INFO org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor:       at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
2019-04-19 20:17:20,109 INFO org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor:       at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
2019-04-19 20:17:20,109 INFO org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor:       at
java.lang.Thread.run(Thread.java:745)
2019-04-19 20:17:20,109 INFO org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor:
2019-04-19 20:17:20,109 INFO org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor:
Shell output: main : command provided 1
2019-04-19 20:17:20,109 INFO org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor:
main : user is test_hadoop
2019-04-19 20:17:20,109 INFO org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor:
main : requested yarn user is datadev
2019-04-19 20:17:20,109 INFO org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor:
Writing to cgroup task files...
2019-04-19 20:17:20,109 INFO org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor:
Can't open file /sys/fs/cgroup/cpu as node manager - Is a directory
2019-04-19 20:17:20,131 WARN org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch:
Container exited with a non-zero exit code 27
2019-04-19 20:17:20,133 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerImpl:
Container container_1554210318404_0042_01_000001 transitioned from RUNNING to EXITED_WITH_FAILURE
2019-04-19 20:17:20,133 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch:
Cleaning up container container_1554210318404_0042_01_000001
 {panel}

> can not use CGroups with YARN in centos7 
> -----------------------------------------
>
>                 Key: YARN-9518
>                 URL: https://issues.apache.org/jira/browse/YARN-9518
>             Project: Hadoop YARN
>          Issue Type: Bug
>    Affects Versions: 3.2.0, 2.9.2, 2.8.5, 2.7.7, 3.1.2
>            Reporter: Shurong Mai
>            Priority: Major
>              Labels: cgroup, patch
>
> When I had set configuration variables  for cgroup with yarn, nodemanager could be start
without any matter. But when I ran a job, the job failed with these exceptional nodemanager
logs in the end.
> In these logs, the important logs is " Can't open file /sys/fs/cgroup/cpu as node manager
- Is a directory "
> After I analysed, I found the reason. In centos6, the cgroup "cpu" and "cpuacct" subsystem
are as follows: 
> {code:java}
> /sys/fs/cgroup/cpu
> /sys/fs/cgroup/cpuacct
> {code}
> But in centos7, as follows:
> {code:java}
> /sys/fs/cgroup/cpu -> cpu,cpuacct
> /sys/fs/cgroup/cpuacct -> cpu,cpuacct
> /sys/fs/cgroup/cpu,cpuacct{code}
> "cpu" and "cpuacct" have merge as "cpu,cpuacct"
>  
> {panel:title=exceptional nodemanager logs}
> 2019-04-19 20:17:20,095 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerImpl:
Container container_1554210318404_0042_01_000001 transitioned from LOCALIZED to RUNNING
> 2019-04-19 20:17:20,101 WARN org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor:
Exit code from container container_1554210318404_0042_01_000001 is : 27
> 2019-04-19 20:17:20,103 WARN org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor:
Exception from container-launch with container ID: container_155421031840
> 4_0042_01_000001 and exit code: 27
> ExitCodeException exitCode=27:
>         at org.apache.hadoop.util.Shell.runCommand(Shell.java:585)
>         at org.apache.hadoop.util.Shell.run(Shell.java:482)
>         at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:776)
>         at org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.launchContainer(LinuxContainerExecutor.java:299)
>         at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:302)
>         at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:82)
>         at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>         at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>         at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>         at java.lang.Thread.run(Thread.java:745)
> 2019-04-19 20:17:20,108 INFO org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor:
Exception from container-launch.
> 2019-04-19 20:17:20,108 INFO org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor:
Container id: container_1554210318404_0042_01_000001
> 2019-04-19 20:17:20,108 INFO org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor:
Exit code: 27
> 2019-04-19 20:17:20,108 INFO org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor:
Stack trace: ExitCodeException exitCode=27:
> 2019-04-19 20:17:20,108 INFO org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor:       at
org.apache.hadoop.util.Shell.runCommand(Shell.java:585)
> 2019-04-19 20:17:20,108 INFO org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor:       at
org.apache.hadoop.util.Shell.run(Shell.java:482)
> 2019-04-19 20:17:20,109 INFO org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor:       at
org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:776)
> 2019-04-19 20:17:20,109 INFO org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor:       at
org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.launchContainer(LinuxContainerExecutor.java:299)
> 2019-04-19 20:17:20,109 INFO org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor:       at
org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:302)
> 2019-04-19 20:17:20,109 INFO org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor:       at
org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:82)
> 2019-04-19 20:17:20,109 INFO org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor:       at
java.util.concurrent.FutureTask.run(FutureTask.java:266)
> 2019-04-19 20:17:20,109 INFO org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor:       at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> 2019-04-19 20:17:20,109 INFO org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor:       at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> 2019-04-19 20:17:20,109 INFO org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor:       at
java.lang.Thread.run(Thread.java:745)
> 2019-04-19 20:17:20,109 INFO org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor:
> 2019-04-19 20:17:20,109 INFO org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor:
Shell output: main : command provided 1
> 2019-04-19 20:17:20,109 INFO org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor:
main : user is test_hadoop
> 2019-04-19 20:17:20,109 INFO org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor:
main : requested yarn user is datadev
> 2019-04-19 20:17:20,109 INFO org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor:
Writing to cgroup task files...
> 2019-04-19 20:17:20,109 INFO org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor:
Can't open file /sys/fs/cgroup/cpu as node manager - Is a directory
> 2019-04-19 20:17:20,131 WARN org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch:
Container exited with a non-zero exit code 27
> 2019-04-19 20:17:20,133 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerImpl:
Container container_1554210318404_0042_01_000001 transitioned from RUNNING to EXITED_WITH_FAILURE
> 2019-04-19 20:17:20,133 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch:
Cleaning up container container_1554210318404_0042_01_000001
>  {panel}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: yarn-issues-help@hadoop.apache.org


Mime
View raw message