hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Wangda Tan (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-6620) [YARN-6223] NM Java side code changes to support isolate GPU devices by using CGroups
Date Wed, 13 Sep 2017 00:12:00 GMT

    [ https://issues.apache.org/jira/browse/YARN-6620?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16163906#comment-16163906

Wangda Tan commented on YARN-6620:


Thanks for reviewing the patch. 

Only several questions/comments: 
bq. 1. XML file reading in GpuDeviceInformationParser.java, can we use the existing libraries
like javax.xml.bind.JAXBContext to unmarshall the XML document to a Java Object instead of
reading tag by tag?
My understanding of JAXBContext is mostly used when we need to convert between object and
XML/JSON. Since output of nvidia-smi is a customized XML format, which doesn't follow JAXB
standard. Is it still best practice to use JAXBContext under such use case? For example, FairScheduler
parses XML file directly: {{AllocationFileLoaderService#reloadAllocations}}. 

bq. 3. Instead of hardcoding the BINARY_NAME, can it be included as part of DEFAULT_NM_GPU_PATH_TO_EXEC
as a default value, so that it can be also becomes configurable if incase users want to change
I considered this option before, unless there's strong need for this to run different command
or call Nvidia native APIs directly, I would prefer to hard code to use nvidia-smi instead
of introducing another abstraction layer. I'm open to do refactoring to support this case
once we have such requirements.

bq. 5. Can we use spaces instead of tab characters for indentation in nvidia-smi-sample-output.xml?
This is directly copy from nvidia-smi output, the major reason is to make sure we can properly
parse real commandline output, so I prefer to keep it as-is.

bq. 6. Are we going to support multiple containers/processes(limited number) sharing the same
GPU device?
No, since no proper isolation can be done for this, I don't plan to support this as of now.

Since YARN-3926 just get merged to trunk, I will add code to support reading/specifying GPU
value in resource object. I will address all comments in the next update. 

> [YARN-6223] NM Java side code changes to support isolate GPU devices by using CGroups
> -------------------------------------------------------------------------------------
>                 Key: YARN-6620
>                 URL: https://issues.apache.org/jira/browse/YARN-6620
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>            Reporter: Wangda Tan
>            Assignee: Wangda Tan
>         Attachments: YARN-6620.001.patch, YARN-6620.002.patch, YARN-6620.003.patch, YARN-6620.004.patch,
> This JIRA plan to add support of:
> 1) GPU configuration for NodeManagers
> 2) Isolation in CGroups. (Java side).
> 3) NM restart and recovery allocated GPU devices

This message was sent by Atlassian JIRA

To unsubscribe, e-mail: yarn-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: yarn-issues-help@hadoop.apache.org

View raw message