hadoop-common-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Chris Nauroth (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HADOOP-12747) support wildcard in libjars argument
Date Tue, 01 Mar 2016 20:20:18 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-12747?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15174355#comment-15174355

Chris Nauroth commented on HADOOP-12747:

bq. You mentioned earlier that libjars don't support non-local paths, but strictly speaking
HADOOP-7112 addresses only the aspect of adding libjars back to the client classpath.

That's very interesting.  I missed the point that non-local jars are skipped only for adding
to the client's own classpath.  {{JobResourceUploader}} separately parses libjars and does
not do the same filtering.  Certainly since non-local libjars for the task is already supported,
we'd have to maintain that behavior for reasons of backwards compatibility.

I find the lack of consistency quite confusing.  It's unclear to me how much of this behavior
is by design and how much is accidental.  I assume the filtering away from the client's classpath
was done to avoid the complexity of needing to run some kind of "mini-localization" on the
client side to support non-local files.

Regarding the proposed options, I have a question on this con for option 2:

bq. con: need to re-interpret or deprecate (minor) behavior, such as adding libjar entries
to the client classpath and allowing directories as a set of classfiles

This sounds backwards-incompatible, right?  If so, then that would tip my opinion towards
option 1.

Also, if wildcard expansion is delayed, then it seems there could be a risk of unexpected
behavior if the contents of the directory change after job submission but before launch of
the container.  Maybe rolling upgrade scenarios would get weird.  (Maybe not if the directories
themselves are version-stamped properly.)

> support wildcard in libjars argument
> ------------------------------------
>                 Key: HADOOP-12747
>                 URL: https://issues.apache.org/jira/browse/HADOOP-12747
>             Project: Hadoop Common
>          Issue Type: New Feature
>          Components: util
>            Reporter: Sangjin Lee
>            Assignee: Sangjin Lee
>         Attachments: HADOOP-12747.01.patch, HADOOP-12747.02.patch, HADOOP-12747.03.patch
> There is a problem when a user job adds too many dependency jars in their command line.
The HADOOP_CLASSPATH part can be addressed, including using wildcards (\*). But the same cannot
be done with the -libjars argument. Today it takes only fully specified file paths.
> We may want to consider supporting wildcards as a way to help users in this situation.
The idea is to handle it the same way the JVM does it: \* expands to the list of jars in that
directory. It does not traverse into any child directory.
> Also, it probably would be a good idea to do it only for libjars (i.e. don't do it for
-files and -archives).

This message was sent by Atlassian JIRA

View raw message