hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Wangda Tan (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-6620) Add support in NodeManager to isolate GPU devices by using CGroups
Date Tue, 17 Oct 2017 23:24:00 GMT

    [ https://issues.apache.org/jira/browse/YARN-6620?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16208590#comment-16208590
] 

Wangda Tan commented on YARN-6620:
----------------------------------

[~tangzhankun], 

I may not make it clear: what I meant is GPU should be a first-class resource instead of mandatory
resource. To me the only mandatory resource for now is memory and vcores, in the future we
might add network/disk as mandatory resource.

The definition of mandatory resource: in order to run process, mandatory resource is must
required.
The definition of first class resource: Officially supported by YARN.

For your questions.

bq. 1. First-class resource should be parsed from resource-types.xml and node-resources.xml(or
auto discover) instead of yarn-site.xml?
To me, for all resources beyond memory/vcores (because of historical reason), they should
be defined in resource-types.xml and node-resources.xml regardless if it is a mandatory or
first-class.

bq. 2. First-calss resource handler should register itself with the same resource name defined
in xml files?
To me this is true when resource isolation on NM side is required, all first-class resource
should started with "yarn.io/" namespace. 

bq. 3. First-class resource should be shown in a separate user-defined column in web pages?
I'm not sure about this, in the future we may add more and more first-class / mandatory resources,
it might be too much if we add columns for every new resources we added. To me the ideal solution
is user can select and filter columns in web UI (support this in new UI should be a trivial
task).

> Add support in NodeManager to isolate GPU devices by using CGroups
> ------------------------------------------------------------------
>
>                 Key: YARN-6620
>                 URL: https://issues.apache.org/jira/browse/YARN-6620
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>            Reporter: Wangda Tan
>            Assignee: Wangda Tan
>             Fix For: 3.1.0
>
>         Attachments: YARN-6620.001.patch, YARN-6620.002.patch, YARN-6620.003.patch, YARN-6620.004.patch,
YARN-6620.005.patch, YARN-6620.006-WIP.patch, YARN-6620.007.patch, YARN-6620.008.patch, YARN-6620.009.patch,
YARN-6620.010.patch, YARN-6620.011.patch, YARN-6620.012.patch, YARN-6620.013.patch, YARN-6620.014.patch,
YARN-6620.015.patch, YARN-6620.016.patch, YARN-6620.017.patch
>
>
> This JIRA plan to add support of:
> 1) GPU configuration for NodeManagers
> 2) Isolation in CGroups. (Java side).
> 3) NM restart and recovery allocated GPU devices



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: yarn-issues-help@hadoop.apache.org


Mime
View raw message