hudi-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From GitBox <...@apache.org>
Subject [GitHub] [incubator-hudi] melkimohamed opened a new issue #1376: Problem Sync Hudi table with Hive
Date Thu, 05 Mar 2020 15:41:56 GMT
melkimohamed opened a new issue #1376: Problem  Sync Hudi table with Hive
URL: https://github.com/apache/incubator-hudi/issues/1376
 
 
   **probleme Hudi 0.5 with hive 2.1.0 **
   
   
   **Describe the problem you faced**
   
    I use hudi 0.5 with spark 2.2 and hive 2.1.0, always the same problem it is not possible
to sync hudi table with hive.
   i suspect that hudi 0.5 is not compatible with hive2.1.0, could you please confirm me ?
   
   
   **To Reproduce**
   
   in my cluster use the tow hudi jars:
   - hudi-spark-bundle-0.5.0-incubating.jar 
   - hudi-hive-bundle-0.5.0-incubating.jar
   spark-shell --conf "spark.serializer=org.apache.spark.serializer.KryoSerializer" --conf
"spark.sql.hive.convertMetastoreParquet=false" --jars hdfs://dhadcluster02/libs/hudi-spark-bundle-0.5.0-incubating.jar,hdfs://dhadcluster02/libs/spark-avro_2.11-2.4.4.jar
   
   ```
   import org.apache.spark.sql.SaveMode
   import org.apache.spark.sql.functions._ 
   import org.apache.hudi.DataSourceWriteOptions 
   import org.apache.hudi.config.HoodieWriteConfig 
   import org.apache.hudi.hive.MultiPartKeysValueExtractor
   
   val inputDataPath = "hdfs://mycluster/apps/hive/warehouse/testhudi.db/employee_parquet"
   val hudiTableName = "employee_parquet_hudi"
   val hudiTablePath = "hdfs://mycluster/apps/hive/warehouse/testhudi.db/employee_parquet_hudi"
   
   val hudiOptions = Map[String,String](
    DataSourceWriteOptions.RECORDKEY_FIELD_OPT_KEY -> "id",
    DataSourceWriteOptions.PARTITIONPATH_FIELD_OPT_KEY -> "year", 
    HoodieWriteConfig.TABLE_NAME -> hudiTableName, 
    DataSourceWriteOptions.OPERATION_OPT_KEY ->
    DataSourceWriteOptions.INSERT_OPERATION_OPT_VAL, 
    DataSourceWriteOptions.STORAGE_TYPE_OPT_KEY -> "COPY_ON_WRITE",
    DataSourceWriteOptions.PRECOMBINE_FIELD_OPT_KEY -> "year",
    DataSourceWriteOptions.HIVE_URL_OPT_KEY -> "jdbc:hive2://host:10000/defaut;principal=hive/host@REALM",
    DataSourceWriteOptions.HIVE_USER_OPT_KEY -> "hive",
    DataSourceWriteOptions.HIVE_DATABASE_OPT_KEY -> "testhudi",
    DataSourceWriteOptions.HIVE_SYNC_ENABLED_OPT_KEY -> "true", 
    DataSourceWriteOptions.HIVE_TABLE_OPT_KEY -> hudiTableName, 
    DataSourceWriteOptions.HIVE_PARTITION_FIELDS_OPT_KEY -> "year", 
    DataSourceWriteOptions.HIVE_ASSUME_DATE_PARTITION_OPT_KEY -> "false",
    DataSourceWriteOptions.HIVE_PARTITION_EXTRACTOR_CLASS_OPT_KEY -> classOf[MultiPartKeysValueExtractor].getName
    )
   val temp = spark.read.format("parquet").load(inputDataPath)
   temp.write.format("org.apache.hudi").options(hudiOptions).mode(SaveMode.Overwrite).save(hudiTablePath)```
   
   
   **Expected behavior**
   
   synch hudi table with hive (create tables)
   
   **Environment Description**
   
   * Hudi version : 0.5
   
   * Spark version : 2.2.0
   
   * Hive version : 2.1.0
   
   * Hadoop version : 2.7.3
   
   * Storage (HDFS/S3/GCS..) : HDFS
   
   * Running on Docker? (yes/no) : NO
   
   
   **Additional context**
   
   I suspect that the problem that hudi 0.5 is not compatible with hive 2.1.0 because I see
in pom.xml <hive.version> 2.3.1 </hive.version>, so I tried to build the hudi
project with hive 2.1.0  : 
   mvn clean package -DskipTests -DskipITs  -Dhive.version =2.1.0 but I encounter another
error
   
   
   **Stacktrace**
   
   ```20/03/05 15:29:15 WARN HoodieSparkSqlWriter$: hoodie dataset at hdfs://dhadcluster02/apps/hive/warehouse/testhudi.db/employee_parquet_hudi
already exists. Deleting existing data & overwriting with new data.
   org.apache.hudi.hive.HoodieHiveSyncException: Failed to check if table exists employee_parquet_hudi
     at org.apache.hudi.hive.HoodieHiveClient.doesTableExist(HoodieHiveClient.java:459)
     at org.apache.hudi.hive.HiveSyncTool.syncHoodieTable(HiveSyncTool.java:91)
     at org.apache.hudi.hive.HiveSyncTool.syncHoodieTable(HiveSyncTool.java:67)
     at org.apache.hudi.HoodieSparkSqlWriter$.syncHive(HoodieSparkSqlWriter.scala:235)
     at org.apache.hudi.HoodieSparkSqlWriter$.write(HoodieSparkSqlWriter.scala:169)
     at org.apache.hudi.DefaultSource.createRelation(DefaultSource.scala:91)
     at org.apache.spark.sql.execution.datasources.DataSource.write(DataSource.scala:471)
     at org.apache.spark.sql.execution.datasources.SaveIntoDataSourceCommand.run(SaveIntoDataSourceCommand.scala:50)
     at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:58)
     at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:56)
     at org.apache.spark.sql.execution.command.ExecutedCommandExec.doExecute(commands.scala:74)
     at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:117)
     at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:117)
     at org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:138)
     at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
     at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:135)
     at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:116)
     at org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:92)
     at org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:92)
     at org.apache.spark.sql.DataFrameWriter.runCommand(DataFrameWriter.scala:609)
     at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:233)
     at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:217)
     ... 62 elided
   Caused by: org.apache.thrift.TApplicationException: Invalid method name: 'get_table_req'
     at org.apache.thrift.TApplicationException.read(TApplicationException.java:111)
     at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:79)
     at org.apache.hudi.org.apache.hadoop_hive.metastore.api.ThriftHiveMetastore$Client.recv_get_table_req(ThriftHiveMetastore.java:1563)
     at org.apache.hudi.org.apache.hadoop_hive.metastore.api.ThriftHiveMetastore$Client.get_table_req(ThriftHiveMetastore.java:1550)
     at org.apache.hudi.org.apache.hadoop_hive.metastore.HiveMetaStoreClient.tableExists(HiveMetaStoreClient.java:1443)
     at org.apache.hudi.hive.HoodieHiveClient.doesTableExist(HoodieHiveClient.java:457)
     ... 83 more
   ```
   
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

Mime
View raw message