hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Szilard Nemeth (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-9138) Test error handling of nvidia-smi binary execution of GpuDiscoverer
Date Fri, 22 Feb 2019 20:52:00 GMT

    [ https://issues.apache.org/jira/browse/YARN-9138?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16775585#comment-16775585

Szilard Nemeth commented on YARN-9138:

Hi [~adam.antal]!

Thanks for your comments, they are very detailed and valuable.

1. Good point, extracted most of the repetitive stuff into methods.

2. As GpuDiscoverer finds out where nvidia-smi lives based on the path provided in the config,
I wanted to keep the behaviour in 
tests as close as possible to the production code. As the script is invoked by a call to Shell.execCommand(),
we can count this as a hard-dependency of this class and it's kinda hard to mock this and
if I done that, it would change GpuDiscoverer in a more fundamental way. To be precise, the
bash script I "generate" in the test is not creating any new files, just echoing the contents
of a very basic XML. I would like to keep this as it is. The only change I made with my new
patch regarding this is the extraction of common things into methods.

3. Logging is not a common thing in tests as far my experience tells. I'm not saying that
it's good or bad, at least that's what I have been seeing. Anyways, I added some logging instead
of the comments in testGetGpuDeviceInformationFaultyNvidiaSmiScriptConsecutiveRun. If you
have ideas on how to have better logs in this test class, feel free to report a new jira under

About the less concerning things: 
1. It was a great idea to extract the parent directory name to a constant so I did that!
2. I gues "RunLinuxGpuResourceDiscoverPluginConfigTest" is set by either the user running
the JVM (with a system property) or by some jenkins job. Probably [~sunilg] can tell you more
on that as I didn't modify the code and he was the committer of this back in end of 2017.
3. Separation of testLinuxGpuResourceDiscoverPluginConfig: I agree, but I would create a follow-up
jira for that. The purpose of my change was not to refactor but rather extend the test coverage.
4. I didn't get your comment about the separation of "getNumberOfUsableGpusFromConfig".

Please check my latest patch!

> Test error handling of nvidia-smi binary execution of GpuDiscoverer
> -------------------------------------------------------------------
>                 Key: YARN-9138
>                 URL: https://issues.apache.org/jira/browse/YARN-9138
>             Project: Hadoop YARN
>          Issue Type: Improvement
>            Reporter: Szilard Nemeth
>            Assignee: Szilard Nemeth
>            Priority: Major
>         Attachments: YARN-9138.001.patch, YARN-9138.002.patch, YARN-9138.003.patch
> The code that executes nvidia-smi (doing GPU device auto-discovery) don't have much test
> This patch adds tests to this part of the code.

This message was sent by Atlassian JIRA

To unsubscribe, e-mail: yarn-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: yarn-issues-help@hadoop.apache.org

View raw message