hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Chengxiang Li (JIRA)" <>
Subject [jira] [Commented] (HIVE-7371) Identify a minimum set of JARs needed to ship to Spark cluster [Spark Branch]
Date Fri, 11 Jul 2014 05:29:04 GMT


Chengxiang Li commented on HIVE-7371:

Similar to MR and Tez engine implementation, 4 kinds of lib dependencies should be shipped
to Spark cluster:
# hive-exec jar, according to hive-exec module's build file, hive-exec JAR is a fat jar contains
a minimal set dependencies for Hive execution.
# auxiliary  jars defined by user through 'hive.aux.jars.path'.
# added jars, user could add jar on Hive CLI. Hive should ship these jars to Spark cluster
# add plugin module dependencies on demand. For example, HBase dependencies are not shipped
to Spark cluster in default, but if data source stored in HBase, and HBaseStorageHandler is
used, Hive should ship HBase related jars to Spark cluster.

> Identify a minimum set of JARs needed to ship to Spark cluster [Spark Branch]
> -----------------------------------------------------------------------------
>                 Key: HIVE-7371
>                 URL:
>             Project: Hive
>          Issue Type: Task
>          Components: Spark
>            Reporter: Xuefu Zhang
>            Assignee: Chengxiang Li
> Currently, Spark client ships all Hive JARs, including those that Hive depends on, to
Spark cluster when a query is executed by Spark. This is not efficient, causing potential
library conflicts. Ideally, only a minimum set of JARs needs to be shipped. This task is to
identify such a set.
> We should learn from current MR cluster, for which I assume only hive-exec JAR is shipped
to MR cluster.
> We also need to ensure that user-supplied JARs are also shipped to Spark cluster, in
a similar fashion as MR does.
> NO PRECOMMIT TESTS. This is for spark-branch only.

This message was sent by Atlassian JIRA

View raw message