hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Wangda Tan (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-6620) [YARN-6223] NM Java side code changes to support isolate GPU devices by using CGroups
Date Mon, 18 Sep 2017 04:32:00 GMT

    [ https://issues.apache.org/jira/browse/YARN-6620?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16169573#comment-16169573
] 

Wangda Tan commented on YARN-6620:
----------------------------------

[~tangzhankun], 

Sorry to cause confusing, the attached patch still need some cleanups and additional code
works, which will take another 2-3 days to finish. Will update the patch once it in a good
state. 

Regarding to ur comments:
bq. 1. Current GPUResourceAllocator is not got from LocalResourceAllocators but created in
GpuResourceHandlerImpl directly. Is this intended?
Yes it is intended, ideally each plugin should maintain its own allocator/handler, etc. I
plan to remove LocalResourceAllocators. 

bq. 2. The GpuResourceHandler get container's requested GPU from an environment key "REQUESTED_GPU_NUM".
So in fact, there's no need to define the allowed GPU resource in "node-resouce.xml"
Ah, this code is done before YARN-3926 merge, so I will update to use resource profile in
the next uploaded patch.

bq. For instance, if different vendors' GPU cards are installed in the cluster, how can a
user distinguish them? thru node attributes?
Good point, I think we should use node attribute to distinguish them. I think this might be
unavoidable: different DL workload needs different driver versions / GPU architectures, and
different frameworks like OpenCL/CUDA, we need node attribute anyway.

> [YARN-6223] NM Java side code changes to support isolate GPU devices by using CGroups
> -------------------------------------------------------------------------------------
>
>                 Key: YARN-6620
>                 URL: https://issues.apache.org/jira/browse/YARN-6620
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>            Reporter: Wangda Tan
>            Assignee: Wangda Tan
>         Attachments: YARN-6620.001.patch, YARN-6620.002.patch, YARN-6620.003.patch, YARN-6620.004.patch,
YARN-6620.005.patch, YARN-6620.006-WIP.patch
>
>
> This JIRA plan to add support of:
> 1) GPU configuration for NodeManagers
> 2) Isolation in CGroups. (Java side).
> 3) NM restart and recovery allocated GPU devices



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: yarn-issues-help@hadoop.apache.org


Mime
View raw message