hudi-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From GitBox <...@apache.org>
Subject [GitHub] [incubator-hudi] eigakow opened a new issue #1398: [SUPPORT] DeltaStreamer - NoClassDefFoundError for HiveDriver
Date Wed, 11 Mar 2020 16:31:13 GMT
eigakow opened a new issue #1398: [SUPPORT] DeltaStreamer - NoClassDefFoundError for HiveDriver
URL: https://github.com/apache/incubator-hudi/issues/1398
 
 
   **Describe the problem you faced**
   
   Using DeltaStreamer with --enable-hive-sync throws `java.lang.NoClassDefFoundError: org/apache/hive/jdbc/HiveDriver`
error.
   Should I change something in the default compilation process to include this class?
   
   **To Reproduce**
   
   Steps to reproduce the behavior:
   
   1.  Properties file:
   ```
   hoodie.datasource.write.recordkey.field=ts
   hoodie.datasource.write.partitionpath.field=ts
   hoodie.deltastreamer.schemaprovider.source.schema.file=file:///home/director/me/hudi-0.5.1-incubating/schema.avro
   hoodie.deltastreamer.schemaprovider.target.schema.file=file:///home/director/me/hudi-0.5.1-incubating/schema.avro
   source-class=FR24JsonKafkaSource
   bootstrap.servers=streaming-kafka-broker-1:9092,streaming-kafka-broker-2:9092,streaming-kafka-broker-3:9092
   group.id=hudi_testing
   hoodie.deltastreamer.source.kafka.topic=fr-bru
   enable.auto.commit=false
   schemaprovider-class=org.apache.hudi.utilities.schema.FilebasedSchemaProvider
   auto.offset.reset=earliest
   
   hoodie.datasource.hive_sync.database=fr24raw
   hoodie.datasource.hive_sync.table=test_hudi
   hoodie.datasource.hive_sync.jdbcurl=jdbc:hive2://master-1.bigdatapoc.local:10000/default;principal=hive/master-1.bigdatapoc.local@BIGDATAPOC.LOCAL
   hoodie.datasource.hive_sync.assume_date_partitioning=true
   hoodie.datasource.hive_sync.useJdbc=false
   ```
   2. Launch spark-submit with HoodieDeltaStreamer
   ```
   spark-submit --master yarn  --class org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer
--jars $(pwd)/../my-app-1-jar-with-dependencies.jar $(pwd)/packaging/hudi-utilities-bundle/target/hudi-utilities-bundle_2.11-0.5.1-incubating.jar
--props hdfs:///tmp/hudi-fr24.properties --target-base-path adl://XXX.azuredatalakestore.net/test-hudi
--table-type MERGE_ON_READ --target-table test_hudi --source-class FR24JsonKafkaSource  --schemaprovider-class
org.apache.hudi.utilities.schema.FilebasedSchemaProvider --enable-hive-sync --continuous --source-limit
100
   ```
   **Expected behavior**
   
   Sync to hive works
   
   **Environment Description**
   
   * Hudi version : hudi-0.5.1-incubating
   
   * Spark version : 2.4.0-cdh6.1.0
   
   * Hive version : 2.1.1-cdh6.1.0
   
   * Hadoop version : 3.0.0-cdh6.1.0
   
   * Storage (HDFS/S3/GCS..) : ADLS
   
   * Running on Docker? (yes/no) : no
   
   
   **Stacktrace**
   
   ```
   0/03/11 16:04:47 INFO cluster.YarnScheduler: Removed TaskSet 37.0, whose tasks have all
completed, from pool
   20/03/11 16:04:47 INFO scheduler.DAGScheduler: ResultStage 37 (collect at HoodieMergeOnReadTableCompactor.java:208)
finished in 0.679 s
   20/03/11 16:04:47 INFO scheduler.DAGScheduler: Job 12 finished: collect at HoodieMergeOnReadTableCompactor.java:208,
took 0.680344 s
   20/03/11 16:04:47 INFO compact.HoodieMergeOnReadTableCompactor: Total of 0 compactions
are retrieved
   20/03/11 16:04:47 INFO compact.HoodieMergeOnReadTableCompactor: Total number of latest
files slices 4
   20/03/11 16:04:47 INFO compact.HoodieMergeOnReadTableCompactor: Total number of log files
0
   20/03/11 16:04:47 INFO compact.HoodieMergeOnReadTableCompactor: Total number of file slices
4
   20/03/11 16:04:47 WARN compact.HoodieMergeOnReadTableCompactor: After filtering, Nothing
to compact for adl://ecintpocdl.azuredatalakestore.net/FlightRadar24/test-hudi3
   20/03/11 16:04:47 INFO deltastreamer.DeltaSync: Syncing target hoodie table with hive table(test_hudi).
Hive metastore URL :jdbc:hive2://master-1.bigdatapoc.local:10000/default;principal=hive/master-1.bigdatapoc.local@BIGDATAPOC.LOCAL,
basePath :adl://XXX.azuredatalakestore.net/test-hudi
   20/03/11 16:04:47 INFO deltastreamer.HoodieDeltaStreamer: Delta Sync shutdown. Error ?false
   20/03/11 16:04:47 WARN deltastreamer.HoodieDeltaStreamer: Gracefully shutting down compactor
   20/03/11 16:05:00 INFO deltastreamer.HoodieDeltaStreamer: Compactor shutting down properly!!
   20/03/11 16:05:00 ERROR deltastreamer.AbstractDeltaStreamerService: Service shutdown with
error
   java.util.concurrent.ExecutionException: java.lang.NoClassDefFoundError: org/apache/hive/jdbc/HiveDriver
           at java.util.concurrent.CompletableFuture.reportGet(CompletableFuture.java:357)
           at java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1895)
           at org.apache.hudi.utilities.deltastreamer.AbstractDeltaStreamerService.waitForShutdown(AbstractDeltaStreamerService.java:72)
           at org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer.sync(HoodieDeltaStreamer.java:117)
           at org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer.main(HoodieDeltaStreamer.java:295)
           at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
           at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
           at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
           at java.lang.reflect.Method.invoke(Method.java:498)
           at org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52)
           at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:849)
           at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:167)
           at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:195)
           at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:86)
           at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:924)
           at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:933)
           at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
   Caused by: java.lang.NoClassDefFoundError: org/apache/hive/jdbc/HiveDriver
           at org.apache.hudi.hive.HoodieHiveClient.<clinit>(HoodieHiveClient.java:80)
           at org.apache.hudi.hive.HiveSyncTool.<init>(HiveSyncTool.java:66)
           at org.apache.hudi.utilities.deltastreamer.DeltaSync.syncHive(DeltaSync.java:481)
           at org.apache.hudi.utilities.deltastreamer.DeltaSync.writeToSink(DeltaSync.java:423)
           at org.apache.hudi.utilities.deltastreamer.DeltaSync.syncOnce(DeltaSync.java:238)
           at org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer$DeltaSyncService.lambda$startService$0(HoodieDeltaStreamer.java:393)
           at java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1590)
           at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
           at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
           at java.lang.Thread.run(Thread.java:748)
   Caused by: java.lang.ClassNotFoundException: org.apache.hive.jdbc.HiveDriver
           at java.net.URLClassLoader.findClass(URLClassLoader.java:382)
           at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
           at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
           ... 10 more
   ```
   
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

Mime
View raw message