hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Josh Spiegel (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HIVE-2269) Hive --auxpath option can't handle multiple colon separated values
Date Tue, 25 Jun 2013 16:03:22 GMT

    [ https://issues.apache.org/jira/browse/HIVE-2269?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13693139#comment-13693139
] 

Josh Spiegel commented on HIVE-2269:
------------------------------------

No matter what the intent was, the case where HIVE_AUX_JARS_PATH is a list is broken. In the
list case, the value of this variable is used in two contradictory ways:

(1) It is colon appended to AUX_CLASSPATH, then CLASSPATH, then HADOOP_CLASSPATH
(2) It is split on "," and appended to Hadoop's -libjars and hive.aux.jars.path

If you use commas, (1) doesn't work.  If you use colons, (2) doesn't work.

I am not sure why user JARs need to go in HADOOP_CLASSPATH but I have noticed failures in
local mode if one of the user JARs contains an InputFormat and it is not in HADOOP_CLASSPATH.
 I think this is probably due to a class caching/loading bug in Hive.  But for whatever reason,
HADOOP_CLASSPATH needs to be set if the jars contain a custom InputFormat.  But otherwise,
setting -libjars and hive.aux.jars.path seems to be sufficient.  This is probably why more
people haven't been impacted by this bug - for simple UDFs, using ',' works because the comma
list in HADOOP_CLASSPATH is silently ignored.

Consider a user with the following constraints:
- Can not modify $HIVE_HOME (i.e. can not add jars to $HIVE_HOME/auxlib)
- Has multiple JARs
- The JARs contain at least one custom InputFormat

In current releases, this user must pick one of the following options:
(1) Add the list of JARs to HADOOP_CLASSPATH manually (with colons) and then set HIVE_AUX_JARS_PATH
(or --auxpath) with commas.  The comma list still gets appended to the end of HADOOP_CLASSPATH
but it should be ignored by the JVM.
(2) Copy all JARs to a single directory.  Set HIVE_AUX_JARS_PATH (or --auxpath) to this single
directory.

Am I missing anything?  Note, "add jars" is not an option for JARs with custom InputFormats
as it does not change HADOOP_CLASSPATH.

I should also mention that Carl's patch allows colons to be used successfully in --auxpath.
 His patch was picked up by or before CDH3u3 but dropped sometime later.  So, there is a brief
period where CDH users can successfully specify a list of JARs with just --auxpath.

Finally, when defining the semantics of "--auxpath", it should support a list of local files
(not URIs) that can be either comma or colon separated.  The script should reformat the list
depending on where it is used.  Supporting just commas or just colons would not be backwards
compatible.  Supporting URIs would not be backwards compatible.  Adding support for directories
in lists may be OK.
                
> Hive --auxpath option can't handle multiple colon separated values
> ------------------------------------------------------------------
>
>                 Key: HIVE-2269
>                 URL: https://issues.apache.org/jira/browse/HIVE-2269
>             Project: Hive
>          Issue Type: Bug
>          Components: CLI
>    Affects Versions: 0.7.0, 0.7.1
>            Reporter: Carl Steinbach
>            Assignee: Carl Steinbach
>         Attachments: HIVE-2269-auxpath.1.patch.txt
>
>


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message