hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jun Gong (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-4122) Add support for GPU as a resource
Date Wed, 27 Apr 2016 12:08:13 GMT

    [ https://issues.apache.org/jira/browse/YARN-4122?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15260021#comment-15260021
] 

Jun Gong commented on YARN-4122:
--------------------------------

{quote}
>From the SLURM lists, it looks like prior to CUDA 7, the environment variable was not
working correctly:
https://devtalk.nvidia.com/default/topic/512869/cuda-accessing-all-devices-even-those-which-are-blacklisted/?offset=2
{quote}
We are using CUDA 7.5 now. I remember we did not come across this problem.

{quote}
This design will probably also have to adjust for the work being done in YARN-4726.
{quote}
Is there any plan for YARN to support GPU? It will be easier to support it based on YARN-3926.
It will be a little complex to allocate GPU on NM because we need take GPU's topological structure
into consideration for better performance.

{quote}
In the doc you say that YARN is currently providing you GPU isolation. How are you making
that work?
{quote}
Use cgroups for hard limit. '*docker run --device=...*' does the same job, we do not need
set cgroups by ourselves.

> Add support for GPU as a resource
> ---------------------------------
>
>                 Key: YARN-4122
>                 URL: https://issues.apache.org/jira/browse/YARN-4122
>             Project: Hadoop YARN
>          Issue Type: New Feature
>            Reporter: Jun Gong
>            Assignee: Jun Gong
>         Attachments: GPUAsAResourceDesign.pdf
>
>
> Use [cgroups devcies|https://www.kernel.org/doc/Documentation/cgroups/devices.txt] to
isolate GPUs for containers. For docker containers, we could use 'docker run --device=...'.
> Reference: [SLURM Resources isolation through cgroups|http://slurm.schedmd.com/slurm_ug_2011/SLURM_UserGroup2011_cgroups.pdf].



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message