hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Miklos Szegedi (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-8031) NodeManager will fail to start if cpu subsystem is already mounted
Date Fri, 16 Mar 2018 16:41:00 GMT

    [ https://issues.apache.org/jira/browse/YARN-8031?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16402162#comment-16402162

Miklos Szegedi commented on YARN-8031:

[~jayceAu], thank you for raising this. If you have CGroups already mounted, you should set
the mount option to false as described here:

Discover CGroups mounted already	This should be used on newer systems like RHEL7 or Ubuntu16
or if the administrator mounts CGroups before YARN starts. Set yarn.nodemanager.linux-container-executor.cgroups.mount to
false and leave other settings set to their defaults. YARN will locate the mount points in /proc/mounts.
Common locations include /sys/fs/cgroup and /cgroup. The default location can vary depending
on the Linux distribution in use.{code}

> NodeManager will fail to start if cpu subsystem is already mounted
> ------------------------------------------------------------------
>                 Key: YARN-8031
>                 URL: https://issues.apache.org/jira/browse/YARN-8031
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: nodemanager
>    Affects Versions: 2.5.0
>            Reporter: JayceAu
>            Priority: Major
>         Attachments: image-2018-03-15-14-47-30-583.png
> if *yarn.nodemanager.linux-container-executor.cgroups.mount* is set to true and cpu
subsystem is not yet mounted, NodeManager will mount the cpu subsystem and then create the
control group whose default name is *hadoop-yarn* if the mount step is successful. This procedure
works well if cpu subsystem is not yet mounted. However, under some situation cpu subsystem
is already mounted before NodeManager starts and NodeManager will fail to start because of
no write permission to the *hadoop-yarn* path . For example:
>  # in OS that use systemd such as centos7 will have cpu subsystem mounted by default
on machine startup
>  # some deamon whose start order is more precedent than NodeManager may also rely on
the mounted state of cpu subsystem. In our production environment, we limit the cpu usage
of the monitoring and control agent, which starts on reboot
> In order to solve this problem, container-executor must be able to create the control
group *hadoop-yarn* if mounting controller is successful or this controller is already mounted.
Besides, if cpu subsystem is used in combination with other subsystem and it's already mounted,
container-executor should use the latest mount point of cpu subsystem instread of the one
provided by NodeManager.

This message was sent by Atlassian JIRA

To unsubscribe, e-mail: yarn-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: yarn-issues-help@hadoop.apache.org

View raw message