spark-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From r...@apache.org
Subject spark git commit: [SPARK-9974] [BUILD] [SQL] Makes sure com.twitter:parquet-hadoop-bundle:1.6.0 is in SBT assembly jar
Date Tue, 18 Aug 2015 00:25:24 GMT
Repository: spark
Updated Branches:
  refs/heads/branch-1.5 0f1417b6f -> 407175e82


[SPARK-9974] [BUILD] [SQL] Makes sure com.twitter:parquet-hadoop-bundle:1.6.0 is in SBT assembly
jar

PR #7967 enables Spark SQL to persist Parquet tables in Hive compatible format when possible.
One of the consequence is that, we have to set input/output classes to `MapredParquetInputFormat`/`MapredParquetOutputFormat`,
which rely on com.twitter:parquet-hadoop:1.6.0 bundled with Hive 1.2.1.

When loading such a table in Spark SQL, `o.a.h.h.ql.metadata.Table` first loads these input/output
format classes, and thus classes in com.twitter:parquet-hadoop:1.6.0.  However, the scope
of this dependency is defined as "runtime", and is not packaged into Spark assembly jar. 
This results in a `ClassNotFoundException`.

This issue can be worked around by asking users to add parquet-hadoop 1.6.0 via the `--driver-class-path`
option.  However, considering Maven build is immune to this problem, I feel it can be confusing
and inconvenient for users.

So this PR fixes this issue by changing scope of parquet-hadoop 1.6.0 to "compile".

Author: Cheng Lian <lian@databricks.com>

Closes #8198 from liancheng/spark-9974/bundle-parquet-1.6.0.

(cherry picked from commit 52ae952574f5d641a398dd185e09e5a79318c8a9)
Signed-off-by: Reynold Xin <rxin@databricks.com>


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/407175e8
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/407175e8
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/407175e8

Branch: refs/heads/branch-1.5
Commit: 407175e824169a01762bdd27f704ac017d6d3e60
Parents: 0f1417b
Author: Cheng Lian <lian@databricks.com>
Authored: Mon Aug 17 17:25:14 2015 -0700
Committer: Reynold Xin <rxin@databricks.com>
Committed: Mon Aug 17 17:25:21 2015 -0700

----------------------------------------------------------------------
 pom.xml | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/spark/blob/407175e8/pom.xml
----------------------------------------------------------------------
diff --git a/pom.xml b/pom.xml
index cfd7d32..9bfca1c 100644
--- a/pom.xml
+++ b/pom.xml
@@ -1598,7 +1598,7 @@
         <groupId>com.twitter</groupId>
         <artifactId>parquet-hadoop-bundle</artifactId>
         <version>${hive.parquet.version}</version>
-        <scope>runtime</scope>
+        <scope>compile</scope>
       </dependency>
       <dependency>
         <groupId>org.apache.flume</groupId>


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@spark.apache.org
For additional commands, e-mail: commits-help@spark.apache.org


Mime
View raw message