hudi-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From GitBox <...@apache.org>
Subject [GitHub] [incubator-hudi] melkimohamed opened a new issue #1439: [SUPPORT] Hudi class loading problem
Date Mon, 23 Mar 2020 15:41:33 GMT
melkimohamed opened a new issue #1439: [SUPPORT] Hudi class loading problem
URL: https://github.com/apache/incubator-hudi/issues/1439
 
 
   **Describe the problem you faced**
   I tested hudi and everything works fine except the count requests
    The only problem when I do a count (select count (*) from table;), I always get the following
error message even though the hudi library is loaded.
   ```
   Caused by: org.apache.hive.com.esotericsoftware.kryo.KryoException: Unable to find class:
org.apache.hudi.hadoop.HoodieParquetInputFormat
   ```
   
   **Note:** I am able to create hudi tables manually and the count query works,the problem
only with automatically created tables (HIVE SYNC)
   do you have any idea on the problem of loading lib hudi on hive ?
   
   **To Reproduce**
   
   Steps to reproduce the behavior:
   1.  Build project (Everything works well)
   I am using HDP 2.6.4 (hive 2.1.0) with HUDI 0.5, I build the project with the steps below
   ```
   git clone git@github.com:apache/incubator-hudi.git
   
    rm hudi-hadoop-mr/src/main/java/org/apache/hudi/hadoop/hive/HoodieCombineHiveInputFormat.java
   
   mvn clean package -DskipTests -DskipITs -Dhive.version=2.1.0
   ```
    In hive.site.xml I added the configuration below
   ```
   hive.reloadable.aux.jars.path=/usr/hudi/hudi-hive-bundle-0.5.0-incubating.jar
   ```
   
   2. Creation of a dataset and synchronized it with hive(Everything works well)
   ```
    export SPARK_MAJOR_VERSION=2
    spark-shell --conf "spark.serializer=org.apache.spark.serializer.KryoSerializer" --conf
"spark.sql.hive.convertMetastoreParquet=false" --jars hdfs://mycluster/libs/hudi-spark-bundle-0.5.0-incubating.jar
   import org.apache.spark.sql.SaveMode
   import org.apache.spark.sql.functions._ 
   import org.apache.hudi.DataSourceWriteOptions 
   import org.apache.hudi.config.HoodieWriteConfig 
   import org.apache.hudi.hive.MultiPartKeysValueExtractor
   val inputDataPath = "hdfs://mycluster/apps/warehouse/test_acid.db/users_parquet"
   val hudiTableName = "users_cor"
   val hudiTablePath = "hdfs://mycluster/apps/warehouse/" + hudiTableName
   
   val hudiOptions = Map[String,String](
    DataSourceWriteOptions.RECORDKEY_FIELD_OPT_KEY -> "id",
    DataSourceWriteOptions.PARTITIONPATH_FIELD_OPT_KEY -> "year", 
    HoodieWriteConfig.TABLE_NAME -> hudiTableName, 
    DataSourceWriteOptions.OPERATION_OPT_KEY ->
    DataSourceWriteOptions.INSERT_OPERATION_OPT_VAL, 
    DataSourceWriteOptions.STORAGE_TYPE_OPT_KEY -> "COPY_ON_WRITE",
    DataSourceWriteOptions.PRECOMBINE_FIELD_OPT_KEY -> "year",
    DataSourceWriteOptions.HIVE_URL_OPT_KEY -> "jdbc:hive2://........:10000/;principal=...",
    DataSourceWriteOptions.HIVE_USER_OPT_KEY -> "hive",
    DataSourceWriteOptions.HIVE_DATABASE_OPT_KEY -> "default",
    DataSourceWriteOptions.HIVE_SYNC_ENABLED_OPT_KEY -> "true", 
    DataSourceWriteOptions.HIVE_TABLE_OPT_KEY -> hudiTableName, 
    DataSourceWriteOptions.HIVE_PARTITION_FIELDS_OPT_KEY -> "year", 
    DataSourceWriteOptions.HIVE_ASSUME_DATE_PARTITION_OPT_KEY -> "false",
    DataSourceWriteOptions.HIVE_PARTITION_EXTRACTOR_CLASS_OPT_KEY -> classOf[MultiPartKeysValueExtractor].getName
    )
   temp.write.format("org.apache.hudi").options(hudiOptions).mode(SaveMode.Overwrite).save(hudiTablePath)
   
   val inputDF = spark.read.format("parquet").load(inputDataPath)
   inputDF.write .format("org.apache.hudi").
       options(hudiOptions).
       mode(SaveMode.Overwrite).
       save(hudiTablePath);
   ```
   **==> all work fine**
   
   3. update data (Everything works well)
   ```
   designation="Account Coordinator";
   val requestToUpdate = "Account Executive"
   val sqlStatement = s"SELECT count (*) FROM tdefault.users_cor WHERE designation = '$requestToUpdate'"
   spark.sql(sqlStatement).show()
   val updateDF = inputDF.filter(col("designation") === requestToUpdate).withColumn("designation",
lit("Account Executive"))
   
   updateDF.write.format("org.apache.hudi").
   options(hudiOptions).option(DataSourceWriteOptions.OPERATION_OPT_KEY, DataSourceWriteOptions.UPSERT_OPERATION_OPT_VAL).
   mode(SaveMode.Append).
   save(hudiTablePath);
   ```
   
   4. DESCRIBE TABLE (Everything works well)
   ```
   DESCRIBE FORMATTED  users_cor;
   ```
   .
   .
   .
   | SerDe Library:                | org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe
   | InputFormat:                  | org.apache.hudi.hadoop.HoodieParquetInputFormat     
           
   | OutputFormat:                 | org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat

   
   5.  Count rows (Problem)
   
   ```select count (*) from users_mor;
   
   20-03-23_16-09-20_722_7229255051541187826-1886/3e2bc38c-1cf9-4d96-b90c-83fd9dd4d277/map.xml:
org.apache.hive.com.esotericsoftware.kryo.KryoException: Unable to find class: org.apache.hudi.hadoop.HoodieParquetInputFormat
   Serialization trace:
   inputFileFormatClass (org.apache.hadoop.hive.ql.plan.PartitionDesc)
   aliasToPartnInfo (org.apache.hadoop.hive.ql.plan.MapWork)
           at org.apache.hadoop.hive.ql.exec.Utilities.getBaseWork(Utilities.java:484)
           at org.apache.hadoop.hive.ql.exec.Utilities.getMapWork(Utilities.java:323)
           at org.apache.hadoop.hive.ql.exec.tez.HiveSplitGenerator.<init>(HiveSplitGenerator.java:101)
           ... 30 more
   Caused by: org.apache.hive.com.esotericsoftware.kryo.KryoException: Unable to find class:
org.apache.hudi.hadoop.HoodieParquetInputFormat
   ```
   
   
   **Environment Description** 
   
   * Hudi version : 0.5
   
   * Spark version : 2.2.0
   
   * Hive version : 2.1.0
   
   * Hadoop version : 2.7.3
   
   * Storage (HDFS/S3/GCS..) :
   
   * Running on Docker? (yes/no) : NO
   
   
   **Additional context**
   
   I checked if the hudi library is loaded, by creating hudi tables manually and synchronizing
hudi tables with Hive
   
   **Stacktrace**
   
   ```select count (*) from users_mor;
   INFO  : Tez session hasn't been created yet. Opening session
   INFO  : Dag name: select count (*) from users_mor(Stage-1)
   ERROR : Status: Failed
   ERROR : Vertex failed, vertexName=Map 1, vertexId=vertex_1579091723876_115812_1_00, diagnostics=[Vertex
vertex_1579091723876_115812_1_00 [Map 1] killed/failed due to:INIT_FAILURE, Fail to create
InputInitializerManager, org.apache.tez.dag.api.TezReflectionException: Unable to instantiate
class with 1 arguments: org.apache.hadoop.hive.ql.exec.tez.HiveSplitGenerator
           at org.apache.tez.common.ReflectionUtils.getNewInstance(ReflectionUtils.java:71)
           at org.apache.tez.common.ReflectionUtils.createClazzInstance(ReflectionUtils.java:89)
           at org.apache.tez.dag.app.dag.RootInputInitializerManager$1.run(RootInputInitializerManager.java:152)
           at org.apache.tez.dag.app.dag.RootInputInitializerManager$1.run(RootInputInitializerManager.java:148)
           at java.security.AccessController.doPrivileged(Native Method)
           at javax.security.auth.Subject.doAs(Subject.java:422)
           at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1869)
           at org.apache.tez.dag.app.dag.RootInputInitializerManager.createInitializer(RootInputInitializerManager.java:148)
           at org.apache.tez.dag.app.dag.RootInputInitializerManager.runInputInitializers(RootInputInitializerManager.java:121)
           at org.apache.tez.dag.app.dag.impl.VertexImpl.setupInputInitializerManager(VertexImpl.java:4620)
           at org.apache.tez.dag.app.dag.impl.VertexImpl.access$4400(VertexImpl.java:202)
           at org.apache.tez.dag.app.dag.impl.VertexImpl$InitTransition.handleInitEvent(VertexImpl.java:3436)
           at org.apache.tez.dag.app.dag.impl.VertexImpl$InitTransition.transition(VertexImpl.java:3385)
           at org.apache.tez.dag.app.dag.impl.VertexImpl$InitTransition.transition(VertexImpl.java:3366)
           at org.apache.hadoop.yarn.state.StateMachineFactory$MultipleInternalArc.doTransition(StateMachineFactory.java:385)
           at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302)
           at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
           at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
           at org.apache.tez.state.StateMachineTez.doTransition(StateMachineTez.java:57)
           at org.apache.tez.dag.app.dag.impl.VertexImpl.handle(VertexImpl.java:1938)
           at org.apache.tez.dag.app.dag.impl.VertexImpl.handle(VertexImpl.java:201)
           at org.apache.tez.dag.app.DAGAppMaster$VertexEventDispatcher.handle(DAGAppMaster.java:2081)
           at org.apache.tez.dag.app.DAGAppMaster$VertexEventDispatcher.handle(DAGAppMaster.java:2067)
           at org.apache.tez.common.AsyncDispatcher.dispatch(AsyncDispatcher.java:184)
           at org.apache.tez.common.AsyncDispatcher$1.run(AsyncDispatcher.java:115)
           at java.lang.Thread.run(Thread.java:745)
   Caused by: java.lang.reflect.InvocationTargetException
           at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
           at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
           at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
           at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
           at org.apache.tez.common.ReflectionUtils.getNewInstance(ReflectionUtils.java:68)
           ... 25 more
   Caused by: java.lang.RuntimeException: Failed to load plan: hdfs://ihadcluster02/tmp/hive/X183677/420271d6-4a80-4894-92d6-fb6ff73b3983/hive_2020-03-23_16-09-20_722_7229255051541187826-1886/3e2bc38c-1cf9-4d96-b90c-83fd9dd4d277/map.xml:
org.apache.hive.com.esotericsoftware.kryo.KryoException: Unable to find class: org.apache.hudi.hadoop.HoodieParquetInputFormat
   Serialization trace:
   inputFileFormatClass (org.apache.hadoop.hive.ql.plan.PartitionDesc)
   aliasToPartnInfo (org.apache.hadoop.hive.ql.plan.MapWork)
           at org.apache.hadoop.hive.ql.exec.Utilities.getBaseWork(Utilities.java:484)
           at org.apache.hadoop.hive.ql.exec.Utilities.getMapWork(Utilities.java:323)
           at org.apache.hadoop.hive.ql.exec.tez.HiveSplitGenerator.<init>(HiveSplitGenerator.java:101)
           ... 30 more
   ``
   
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

Mime
View raw message