hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Wangda Tan (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-6620) Add support in NodeManager to isolate GPU devices by using CGroups
Date Tue, 17 Oct 2017 23:24:00 GMT

    [ https://issues.apache.org/jira/browse/YARN-6620?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16208590#comment-16208590

Wangda Tan commented on YARN-6620:


I may not make it clear: what I meant is GPU should be a first-class resource instead of mandatory
resource. To me the only mandatory resource for now is memory and vcores, in the future we
might add network/disk as mandatory resource.

The definition of mandatory resource: in order to run process, mandatory resource is must
The definition of first class resource: Officially supported by YARN.

For your questions.

bq. 1. First-class resource should be parsed from resource-types.xml and node-resources.xml(or
auto discover) instead of yarn-site.xml?
To me, for all resources beyond memory/vcores (because of historical reason), they should
be defined in resource-types.xml and node-resources.xml regardless if it is a mandatory or

bq. 2. First-calss resource handler should register itself with the same resource name defined
in xml files?
To me this is true when resource isolation on NM side is required, all first-class resource
should started with "yarn.io/" namespace. 

bq. 3. First-class resource should be shown in a separate user-defined column in web pages?
I'm not sure about this, in the future we may add more and more first-class / mandatory resources,
it might be too much if we add columns for every new resources we added. To me the ideal solution
is user can select and filter columns in web UI (support this in new UI should be a trivial

> Add support in NodeManager to isolate GPU devices by using CGroups
> ------------------------------------------------------------------
>                 Key: YARN-6620
>                 URL: https://issues.apache.org/jira/browse/YARN-6620
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>            Reporter: Wangda Tan
>            Assignee: Wangda Tan
>             Fix For: 3.1.0
>         Attachments: YARN-6620.001.patch, YARN-6620.002.patch, YARN-6620.003.patch, YARN-6620.004.patch,
YARN-6620.005.patch, YARN-6620.006-WIP.patch, YARN-6620.007.patch, YARN-6620.008.patch, YARN-6620.009.patch,
YARN-6620.010.patch, YARN-6620.011.patch, YARN-6620.012.patch, YARN-6620.013.patch, YARN-6620.014.patch,
YARN-6620.015.patch, YARN-6620.016.patch, YARN-6620.017.patch
> This JIRA plan to add support of:
> 1) GPU configuration for NodeManagers
> 2) Isolation in CGroups. (Java side).
> 3) NM restart and recovery allocated GPU devices

This message was sent by Atlassian JIRA

To unsubscribe, e-mail: yarn-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: yarn-issues-help@hadoop.apache.org

View raw message