kudu-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From granthe...@apache.org
Subject kudu git commit: [Java] Remove shaded parquet and spark-avro dependencies
Date Mon, 23 Apr 2018 19:41:08 GMT
Repository: kudu
Updated Branches:
  refs/heads/master 08b01ecf3 -> 28d847513


[Java] Remove shaded parquet and spark-avro dependencies

Currently the kudu-client-tools project shades parquet and
the kudu-spark-tools project shades spark-avro. According
to commit history this is done for classpath convenience when
the parquet and avro import/export options are used.

However, our current shading configuration in Maven is not
pulling in transitive dependencies but instead only including
the direct classes from those jars. If it works at runtime it's just
luck that the rest of the classes are there or unused.

The Gradle build is currently including all of the transtive
dependencies, but that results in a jar that is very large
including things like the Scala library, Jackson, Snappy, etc.

Instead of packaging/shading such large dependencies,
this patch changes the dependencies to provided scope which
accurately representing the fact that the libarries are expected
to be on the classpath at runtime and document’s the details.

Change-Id: Iccf46be9eebb91e900a9ebb5f99b6510165956e7
Reviewed-on: http://gerrit.cloudera.org:8080/10147
Tested-by: Grant Henke <granthenke@apache.org>
Reviewed-by: Adar Dembo <adar@cloudera.com>


Project: http://git-wip-us.apache.org/repos/asf/kudu/repo
Commit: http://git-wip-us.apache.org/repos/asf/kudu/commit/28d84751
Tree: http://git-wip-us.apache.org/repos/asf/kudu/tree/28d84751
Diff: http://git-wip-us.apache.org/repos/asf/kudu/diff/28d84751

Branch: refs/heads/master
Commit: 28d847513c1f435a114931c0c3cb51450ad42762
Parents: 08b01ec
Author: Grant Henke <granthenke@apache.org>
Authored: Sat Apr 21 14:03:11 2018 -0500
Committer: Grant Henke <granthenke@apache.org>
Committed: Mon Apr 23 19:39:24 2018 +0000

----------------------------------------------------------------------
 docs/developing.adoc                | 10 +++++++++-
 java/kudu-client-tools/build.gradle |  5 +----
 java/kudu-client-tools/pom.xml      | 15 ++++++---------
 java/kudu-spark-tools/build.gradle  |  5 +----
 java/kudu-spark-tools/pom.xml       |  5 +----
 5 files changed, 18 insertions(+), 22 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/kudu/blob/28d84751/docs/developing.adoc
----------------------------------------------------------------------
diff --git a/docs/developing.adoc b/docs/developing.adoc
index 4d1c659..d0faa88 100644
--- a/docs/developing.adoc
+++ b/docs/developing.adoc
@@ -182,7 +182,9 @@ name and keytab location must be provided through the `--principal` and
   `Date` and complex types are not supported.
 - Kudu tables may only be registered as temporary tables in SparkSQL.
   Kudu tables may not be queried using HiveContext.
-
+- When importing or exporting Avro via the tools in `kudu-spark-tools`
+  the `spark-avro` dependency jars are expected to be on the classpath.
+  You can read more about `spark-avro` link:https://github.com/databricks/spark-avro[here].
 
 == Kudu Python Client
 The Kudu Python client provides a Python friendly interface to the C++ client API.
@@ -254,3 +256,9 @@ and
 link:https://github.com/apache/kudu/blob/master/java/kudu-client-tools/src/main/java/org/apache/kudu/mapreduce/tools/ImportCsv.java[ImportCsv.java]
 for examples which you can model your own integrations on. Stay tuned for more examples
 using YARN and Spark in the future.
+
+=== MapReduce Integration Known Issues and Limitations
+
+- When importing or exporting Parquet via the tools in `kudu-client-tools`
+  the `parquet-hadoop` dependency jars are expected to be on the classpath.
+  You can read more about `parquet-hadoop` link:https://github.com/apache/parquet-mr[here].
\ No newline at end of file

http://git-wip-us.apache.org/repos/asf/kudu/blob/28d84751/java/kudu-client-tools/build.gradle
----------------------------------------------------------------------
diff --git a/java/kudu-client-tools/build.gradle b/java/kudu-client-tools/build.gradle
index 9cb53c2..b6c683d 100644
--- a/java/kudu-client-tools/build.gradle
+++ b/java/kudu-client-tools/build.gradle
@@ -22,11 +22,8 @@ dependencies {
   compile libs.guava
   compile libs.slf4jApi
 
-  // This artifact is in compile scope for convenience, as it is typically
-  // not included in the job classpath by MapReduce platform providers.
-  compile libs.parquetHadoop
-
   provided libs.hadoopClient
+  provided libs.parquetHadoop
 
   optional libs.yetusAnnotations
 

http://git-wip-us.apache.org/repos/asf/kudu/blob/28d84751/java/kudu-client-tools/pom.xml
----------------------------------------------------------------------
diff --git a/java/kudu-client-tools/pom.xml b/java/kudu-client-tools/pom.xml
index 1c768de..35e8162 100644
--- a/java/kudu-client-tools/pom.xml
+++ b/java/kudu-client-tools/pom.xml
@@ -56,20 +56,18 @@
             <version>${slf4j.version}</version>
         </dependency>
 
-        <!-- This artifact is in compile scope for convenience, as it is typically
-             not included in the job classpath by MapReduce platform providers. -->
-        <dependency>
-            <groupId>org.apache.parquet</groupId>
-            <artifactId>parquet-hadoop</artifactId>
-            <version>${parquet.version}</version>
-        </dependency>
-
         <dependency>
             <groupId>org.apache.hadoop</groupId>
             <artifactId>hadoop-client</artifactId>
             <version>${hadoop.version}</version>
             <scope>provided</scope>
         </dependency>
+        <dependency>
+            <groupId>org.apache.parquet</groupId>
+            <artifactId>parquet-hadoop</artifactId>
+            <version>${parquet.version}</version>
+            <scope>provided</scope>
+        </dependency>
 
         <dependency>
             <groupId>org.apache.yetus</groupId>
@@ -129,7 +127,6 @@
                             <include>com.google.guava:guava</include>
                             <include>org.apache.kudu:kudu-client</include>
                             <include>org.apache.kudu:kudu-mapreduce</include>
-                            <include>org.apache.parquet:parquet-hadoop</include>
                         </includes>
                     </artifactSet>
                     <relocations>

http://git-wip-us.apache.org/repos/asf/kudu/blob/28d84751/java/kudu-spark-tools/build.gradle
----------------------------------------------------------------------
diff --git a/java/kudu-spark-tools/build.gradle b/java/kudu-spark-tools/build.gradle
index ebb3413..b32ab20 100644
--- a/java/kudu-spark-tools/build.gradle
+++ b/java/kudu-spark-tools/build.gradle
@@ -24,11 +24,8 @@ dependencies {
   compile project(path: ":kudu-spark", configuration: "shadow")
   compile libs.slf4jApi
 
-  // This artifact is in compile scope for convenience, as it is typically
-  // not included in the Spark submit classpath by Spark platform providers.
-  compile libs.sparkAvro
-
   provided libs.scalaLibrary
+  provided libs.sparkAvro
   provided libs.sparkCore
   provided libs.sparkSql
 

http://git-wip-us.apache.org/repos/asf/kudu/blob/28d84751/java/kudu-spark-tools/pom.xml
----------------------------------------------------------------------
diff --git a/java/kudu-spark-tools/pom.xml b/java/kudu-spark-tools/pom.xml
index c4a9127..d97bac8 100644
--- a/java/kudu-spark-tools/pom.xml
+++ b/java/kudu-spark-tools/pom.xml
@@ -53,14 +53,12 @@
             <version>${slf4j.version}</version>
         </dependency>
 
-        <!-- This artifact is in compile scope for convenience, as it is typically
-             not included in the Spark submit classpath by Spark platform providers. -->
         <dependency>
             <groupId>com.databricks</groupId>
             <artifactId>spark-avro_${scala.binary.version}</artifactId>
             <version>${sparkavro.version}</version>
+            <scope>provided</scope>
         </dependency>
-
         <dependency>
             <groupId>org.apache.spark</groupId>
             <artifactId>spark-core_${scala.binary.version}</artifactId>
@@ -189,7 +187,6 @@
                             <include>org.apache.kudu:kudu-client</include>
                             <include>org.apache.kudu:kudu-client-tools</include>
                             <include>org.apache.kudu:kudu-${spark.version.label}_${scala.binary.version}</include>
-                            <include>com.databricks:spark-avro_${scala.binary.version}</include>
                         </includes>
                     </artifactSet>
                 </configuration>


Mime
View raw message