hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jason Lowe (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-2185) Use pipes when localizing archives
Date Mon, 22 Jan 2018 20:48:00 GMT

    [ https://issues.apache.org/jira/browse/YARN-2185?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16334872#comment-16334872
] 

Jason Lowe commented on YARN-2185:
----------------------------------

Thanks for updating the patch!

Should a SuppressWarnings("deprecation") be added?  I personally would rather see that with
a comment next to the call site explaining why we're using a deprecated method rather than
add yet another warning to the pile, but I'm curious what others think here.

There were checks for paths with embedded single-quotes which is missing.  The code should
be escaping single quotes in the filename to avoid the shell mis-parsing the command.

runCommandOnStream is only creating a thread pool and reading the subprocess stdout and stderr
if logging is enabled.  If the subprocess ends up producing too much output on either channel
then this will deadlock.  The child process will stop consuming input waiting for the output
stream to be consumed but the parent process will be busy blocked waiting for the subprocess
to consume more input.  We need to be consuming the subprocess stdout and stderr even if we
do not intend to log it.  If not being logged or otherwise acted upon then the data can simply
be thrown away.

Speaking of throwing away subprocess output, if the tar command fails there will be nothing
but an exit code to try to figure out what went wrong.  The existing unTarUsingTar gets this
behavior via the ShellCommandExecutor.  I think runCommandOnStream should throw an exception
(e.g.: ExitCodeException or something similar) containing the error output if the subprocess
does not return a zero exit code.


> Use pipes when localizing archives
> ----------------------------------
>
>                 Key: YARN-2185
>                 URL: https://issues.apache.org/jira/browse/YARN-2185
>             Project: Hadoop YARN
>          Issue Type: Improvement
>          Components: nodemanager
>    Affects Versions: 2.4.0
>            Reporter: Jason Lowe
>            Assignee: Miklos Szegedi
>            Priority: Major
>         Attachments: YARN-2185.000.patch, YARN-2185.001.patch, YARN-2185.002.patch, YARN-2185.003.patch,
YARN-2185.004.patch, YARN-2185.005.patch, YARN-2185.006.patch, YARN-2185.007.patch, YARN-2185.008.patch
>
>
> Currently the nodemanager downloads an archive to a local file, unpacks it, and then
removes it.  It would be more efficient to stream the data as it's being unpacked to avoid
both the extra disk space requirements and the additional disk activity from storing the archive.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: yarn-issues-help@hadoop.apache.org


Mime
View raw message