hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jason Lowe (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-2185) Use pipes when localizing archives
Date Fri, 12 Jan 2018 20:56:00 GMT

    [ https://issues.apache.org/jira/browse/YARN-2185?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16324545#comment-16324545
] 

Jason Lowe commented on YARN-2185:
----------------------------------

Thanks for updating the patch!

bq. I was wondering, if it should be better for OS specific archives containing symlinks for
example.

symlinks is a good example of why it is important to use {{tar}} on Linux.  However if we're
going to use OS tools to support ZIP files properly then we should not be using the {{jar}}
command.  {{jar}} does not properly support symlinks in a ZIP archive but {{unzip}} does.
 To be honest I don't see the appeal of using the jar command for anything since I do not
see how it is going to do anything different than the JDK builtin support since it is part
of the JDK.  As with the parallel copying and timestamp ignore support, I think it may be
best to have this patch focus on using pipes rather than also changing the semantics of what
tools are used to unpack the archives.  We can discuss what should be done with jars and zips
in another JIRA.

Speaking of symbolic links, now that all supported releases are on JDK7 or later, we can update
unTarUsingJava to fix the lacking symlink support.  I filed HADOOP-15170 to track that.

The failure on the path containing a single quote seems overly paranoid.  Single quotes can
be escaped in the POSIX shell code path per my previous comment, e.g.: 
{code}
  filename.replace("'","'\\''")
{code}

Rather than do the subshell with an extra {{cd}}, could it be simplified to just {{ tar -C
}}?

It doesn't look like the following comment was addressed as runCommandOnStream is still creating
an executor sometimes yet does not clean it up in those cases.  The Javadoc should also be
updated to document the deadlock potential.
{quote}
I think we should shutdown the executor service if we created it, otherwise we risk running
out of threads before the garbage collector gets around to noticing it can free everything
up.  Either that or require the caller to provide the executor.
{quote}

Speaking of deadlock, if debug logging is not enabled then this can deadlock in the manner
I described above.  That's one of the reasons why I think leveraging Shell isn't a terrible
idea.  Wielding ProcessBuilder and Process properly is trickier than most people believe.
 It also side-steps a lot of the executor management stuff.  I'm OK if you still think it's
better to roll our own here.

"parallelVerifyAndCopy" is no longer parallel so the name and javadoc is misleading.  It also
does not throw InterruptedException nor ExecutionException.

It's weird to allow the user to specify a downloader thread count yet we setup and teardown
the executor for every unpack call and the unpack call is only going to use two threads. 
Essentially it shouldn't be a config since any value but 2 is either wrong or wasteful.

Speaking of executors, would it makes more sense for FSDownload to be able to leverage an
existing executor rather than creating its own?  All of the existing usages of FSDownload
are using an executor of one form or another, so it'd be nice not to have to keep building
and tearing these down for every call.

I'm still confuse about the recursive destination directory delete that was added.  It was
not there before, and I don't think it's appropriate here.  What are we worried is going to
be there, and if somehow something is there, are we sure it's OK to just blindly remove it?

destionationTmp s/b destinationTmp

FSDownloadException is no longer needed.


> Use pipes when localizing archives
> ----------------------------------
>
>                 Key: YARN-2185
>                 URL: https://issues.apache.org/jira/browse/YARN-2185
>             Project: Hadoop YARN
>          Issue Type: Improvement
>          Components: nodemanager
>    Affects Versions: 2.4.0
>            Reporter: Jason Lowe
>            Assignee: Miklos Szegedi
>         Attachments: YARN-2185.000.patch, YARN-2185.001.patch, YARN-2185.002.patch, YARN-2185.003.patch,
YARN-2185.004.patch
>
>
> Currently the nodemanager downloads an archive to a local file, unpacks it, and then
removes it.  It would be more efficient to stream the data as it's being unpacked to avoid
both the extra disk space requirements and the additional disk activity from storing the archive.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: yarn-issues-help@hadoop.apache.org


Mime
View raw message