hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Wangda Tan (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (YARN-6852) [YARN-6223] Native code changes to support isolate GPU devices by using CGroups
Date Thu, 20 Jul 2017 21:17:00 GMT

     [ https://issues.apache.org/jira/browse/YARN-6852?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Wangda Tan updated YARN-6852:
-----------------------------
    Attachment: YARN-6852.001.patch

Attached YARN-6852.001 patch for review.

Thanks [~chris.douglas]/[~vinodkv]/[~sunilg]/[~subru]/[~sidharta-s]/[~vvasudev] for their
inputs. 

*For the ver.001 patch, it added:*
1) "Module" concept to container-executor binary. Which we can easier manage code and decouple
their functionalities. In the future, as described in YARN-5673, we can use dlopen, etc. to
dynamically load modules for even better code isolation.

2) "common-module": This is some common logics which will be shared by different modules (located
in impl/modules/common), for example, check if module is enabled or not.

3) "cgroups-module": Modify devices subcomponent in cgroup needs root permission (see \[1\]).
So we have to move some funtionalities from Java to C side. The cgroups-module is added for
this purpose. (located in impl/modules/cgroups). Please note that to avoid security issues,
cgroups module doesn't have any user-invokable interface. So "yarn" user cannot use "container-executor"
to do any cgroups related operations directly.
Configs of cgroups module see below. Please note that it doesn't include a knob to enable
or disable it since it is not directly invoked by user (using CLI).

4) "gpu-module": Do gpu isolation. User can call the gpu operation by using following commandlines
to block container accessing GPUs.
{code}
container-executor gpu --excluded_gpus=0,1 --container_id=container_x_y_z
{code}
We will do strict checks for container_id to make sure malicious user won't use this tool
to block GPU devices of cgroups not owned by a YARN launched container.

*For configs, this patch is based on YARN-6033 (not committed yet). So configs of different
modules are placed under different sections, like following:*
{code}
############## Original container-executor configs ##################
yarn.nodemanager.linux-container-executor.group=#configured value of yarn.nodemanager.linux-container-executor.group
banned.users=#comma separated list of users who can not run applications
min.user.id=1000#Prevent other super-users
allowed.system.users=##comma separated list of system users who CAN run applications
#####################################################################

[cgroups]
# Root of system cgroups (Cannot be empty or "/")
root=/sys/fs/cgroups
# Parent folder of YARN's CGroups
yarn-hierarchy=yarn (Cannot be empty)

[gpu]
# Enable / disable the module
module.enabled=true/false 
# Major device number of GPU, by default is 195
gpu.major-device-number=195
# Allowed minor device numbers, empty means all GPU devices managed by YARN.
gpu.allowed-device-minor-numbers=0,1,2,3
{code}

\[1\]: https://lists.linuxfoundation.org/pipermail/containers/2012-November/030840.html 

> [YARN-6223] Native code changes to support isolate GPU devices by using CGroups
> -------------------------------------------------------------------------------
>
>                 Key: YARN-6852
>                 URL: https://issues.apache.org/jira/browse/YARN-6852
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>            Reporter: Wangda Tan
>            Assignee: Wangda Tan
>         Attachments: YARN-6852.001.patch
>
>
> This JIRA plan to add support of:
> 1) Isolation in CGroups. (native side).



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: yarn-issues-help@hadoop.apache.org


Mime
View raw message