hudi-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Udit Mehrotra (Jira)" <j...@apache.org>
Subject [jira] [Created] (HUDI-607) Hive sync fails to register tables partitioned by Date Type column
Date Thu, 13 Feb 2020 02:03:00 GMT
Udit Mehrotra created HUDI-607:
----------------------------------

             Summary: Hive sync fails to register tables partitioned by Date Type column
                 Key: HUDI-607
                 URL: https://issues.apache.org/jira/browse/HUDI-607
             Project: Apache Hudi (incubating)
          Issue Type: Bug
          Components: Hive Integration
            Reporter: Udit Mehrotra


h2. Issue Description

As part of spark to avro conversion, Spark's *Date* type is represented as corresponding *Date
Logical Type* in Avro, which is underneath represented in Avro by physical *Integer* type.
For this reason when forming the Avro records from Spark rows, it is converted to corresponding
*Epoch day* to be stored as corresponding *Integer* value in the parquet files.

However, this manifests into a problem that when a *Date Type* column is chosen as partition
column. In this case, Hudi's partition column *_hoodie_partition_path* also gets the corresponding
*epoch day integer* value when reading the partition field from the avro record, and as a
result syncing partitions in hudi table issues a command like the following, where the date
is an integer:
{noformat}
ALTER TABLE uditme_hudi.uditme_hudi_events_cow_feb05_00 ADD IF NOT EXISTS   PARTITION (event_date='17897')
LOCATION 's3://emr-users/uditme/hudi/tables/events/uditme_hudi_events_cow_feb05_00/17897'
  PARTITION (event_date='17898') LOCATION 's3://emr-users/uditme/hudi/tables/events/uditme_hudi_events_cow_feb05_00/17898'
  PARTITION (event_date='17899') LOCATION 's3://emr-users/uditme/hudi/tables/events/uditme_hudi_events_cow_feb05_00/17899'
  PARTITION (event_date='17900') LOCATION 's3://emr-users/uditme/hudi/tables/events/uditme_hudi_events_cow_feb05_00/17900'{noformat}
Hive is not able to make sense of the partition field values like *17897* as it is not able
to convert it to corresponding date from this string. It actually expects the actual date
to be represented in string form.

So, we need to make sure that Hudi's partition field gets the actual date value in string
form, instead of the integer. This change makes sure that when a fields value is retrieved
from the Avro record, we check that if its *Date Logical Type* we return the actual date value,
instead of the epoch. After this change the command for sync partitions issues is like:
{noformat}
ALTER TABLE `uditme_hudi`.`uditme_hudi_events_cow_feb05_01` ADD IF NOT EXISTS   PARTITION
(`event_date`='2019-01-01') LOCATION 's3://emr-users/uditme/hudi/tables/events/uditme_hudi_events_cow_feb05_01/2019-01-01'
  PARTITION (`event_date`='2019-01-02') LOCATION 's3://emr-users/uditme/hudi/tables/events/uditme_hudi_events_cow_feb05_01/2019-01-02'
  PARTITION (`event_date`='2019-01-03') LOCATION 's3://emr-users/uditme/hudi/tables/events/uditme_hudi_events_cow_feb05_01/2019-01-03'
  PARTITION (`event_date`='2019-01-04') LOCATION 's3://emr-users/uditme/hudi/tables/events/uditme_hudi_events_cow_feb05_01/2019-01-04'{noformat}
h2. Stack Trace
{noformat}
20/01/13 23:28:04 INFO HoodieHiveClient: Last commit time synced is not known, listing all
partitions in s3://feichi-test/fact_hourly_search_term_conversions/merge_on_read_aws_hudi_jar,FS
:com.amazon.ws.emr.hadoop.fs.s3n.S3NativeFileSystem@1f0c8e1f
20/01/13 23:28:08 INFO HiveSyncTool: Storage partitions scan complete. Found 31
20/01/13 23:28:08 INFO HiveSyncTool: New Partitions [18206, 18207, 18208, 18209, 18210, 18211,
18212, 18213, 18214, 18215, 18216, 18217, 18218, 18219, 18220, 18221, 18222, 18223, 18224,
18225, 18226, 18227, 18228, 18229, 18230, 18231, 18232, 18233, 18234, 18235, 18236]
20/01/13 23:28:08 INFO HoodieHiveClient: Adding partitions 31 to table fact_hourly_search_term_conversions_hudi_mor_hudi_jar
20/01/13 23:28:08 INFO HoodieHiveClient: Executing SQL ALTER TABLE default.fact_hourly_search_term_conversions_hudi_mor_hudi_jar
ADD IF NOT EXISTS   PARTITION (dim_date='18206') LOCATION 's3://feichi-test/fact_hourly_search_term_conversions/merge_on_read_aws_hudi_jar/18206'
  PARTITION (dim_date='18207') LOCATION $
s3://feichi-test/fact_hourly_search_term_conversions/merge_on_read_aws_hudi_jar/18207'   PARTITION
(dim_date='18208') LOCATION 's3://feichi-test/fact_hourly_search_term_conversions/merge_on_read_aws_hudi_jar/18208'
  PARTITION (dim_date='18209') LOCATION 's3://feichi-test/fact_hourly_search_term_conversions/merge_$
n_read_aws_hudi_jar/18209'   PARTITION (dim_date='18210') LOCATION 's3://feichi-test/fact_hourly_search_term_conversions/merge_on_read_aws_hudi_jar/18210'
  PARTITION (dim_date='18211') LOCATION 's3://feichi-test/fact_hourly_search_term_conversions/merge_on_read_aws_hudi_jar/18211'
  PARTITION (dim_date='18212') L$
CATION 's3://feichi-test/fact_hourly_search_term_conversions/merge_on_read_aws_hudi_jar/18212'
  PARTITION (dim_date='18213') LOCATION 's3://feichi-test/fact_hourly_search_term_conversions/merge_on_read_aws_hudi_jar/18213'
  PARTITION (dim_date='18214') LOCATION 's3://feichi-test/fact_hourly_search_term_conversion$
/merge_on_read_aws_hudi_jar/18214'   PARTITION (dim_date='18215') LOCATION 's3://feichi-test/fact_hourly_search_term_conversions/merge_on_read_aws_hudi_jar/18215'
  PARTITION (dim_date='18216') LOCATION 's3://feichi-test/fact_hourly_search_term_conversions/merge_on_read_aws_hudi_jar/18216'
  PARTITION (dim_date='1$
217') LOCATION 's3://feichi-test/fact_hourly_search_term_conversions/merge_on_read_aws_hudi_jar/18217'
  PARTITION (dim_date='18218') LOCATION 's3://feichi-test/fact_hourly_search_term_conversions/merge_on_read_aws_hudi_jar/18218'
  PARTITION (dim_date='18219') LOCATION 's3://feichi-test/fact_hourly_search_term_co$
versions/merge_on_read_aws_hudi_jar/18219'   PARTITION (dim_date='18220') LOCATION 's3://feichi-test/fact_hourly_search_term_conversions/merge_on_read_aws_hudi_jar/18220'
  PARTITION (dim_date='18221') LOCATION 's3://feichi-test/fact_hourly_search_term_conversions/merge_on_read_aws_hudi_jar/18221'
  PARTITION (dim$
date='18222') LOCATION 's3://feichi-test/fact_hourly_search_term_conversions/merge_on_read_aws_hudi_jar/18222'
  PARTITION (dim_date='18223') LOCATION 's3://feichi-test/fact_hourly_search_term_conversions/merge_on_read_aws_hudi_jar/18223'
  PARTITION (dim_date='18224') LOCATION 's3://feichi-test/fact_hourly_search$
term_conversions/merge_on_read_aws_hudi_jar/18224'   PARTITION (dim_date='18225') LOCATION
's3://feichi-test/fact_hourly_search_term_conversions/merge_on_read_aws_hudi_jar/18225'  
PARTITION (dim_date='18226') LOCATION 's3://feichi-test/fact_hourly_search_term_conversions/merge_on_read_aws_hudi_jar/18226'
  PARTIT$
ON (dim_date='18227') LOCATION 's3://feichi-test/fact_hourly_search_term_conversions/merge_on_read_aws_hudi_jar/18227'
  PARTITION (dim_date='18228') LOCATION 's3://feichi-test/fact_hourly_search_term_conversions/merge_on_read_aws_hudi_jar/18228'
  PARTITION (dim_date='18229') LOCATION 's3://feichi-test/fact_hourl$
_search_term_conversions/merge_on_read_aws_hudi_jar/18229'   PARTITION (dim_date='18230')
LOCATION 's3://feichi-test/fact_hourly_search_term_conversions/merge_on_read_aws_hudi_jar/18230'
  PARTITION (dim_date='18231') LOCATION 's3://feichi-test/fact_hourly_search_term_conversions/merge_on_read_aws_hudi_jar/18231'
 PARTITION (dim_date='18232') LOCATION 's3://feichi-test/fact_hourly_search_term_conversions/merge_on_read_aws_hudi_jar/18232'
  PARTITION (dim_date='18233') LOCATION 's3://feichi-test/fact_hourly_search_term_conversions/merge_on_read_aws_hudi_jar/18233'
  PARTITION (dim_date='18234') LOCATION 's3://feichi-test/fa$
t_hourly_search_term_conversions/merge_on_read_aws_hudi_jar/18234'   PARTITION (dim_date='18235')
LOCATION 's3://feichi-test/fact_hourly_search_term_conversions/merge_on_read_aws_hudi_jar/18235'
  PARTITION (dim_date='18236') LOCATION 's3://feichi-test/fact_hourly_search_term_conversions/merge_on_read_aws_hudi_jar/
18236'
org.apache.hudi.hive.HoodieHiveSyncException: Failed to sync partitions for table fact_hourly_search_term_conversions_hudi_mor_hudi_jar
  at org.apache.hudi.hive.HiveSyncTool.syncPartitions(HiveSyncTool.java:177)
  at org.apache.hudi.hive.HiveSyncTool.syncHoodieTable(HiveSyncTool.java:107)
  at org.apache.hudi.hive.HiveSyncTool.syncHoodieTable(HiveSyncTool.java:71)
  at org.apache.hudi.HoodieSparkSqlWriter$.syncHive(HoodieSparkSqlWriter.scala:236)
  at org.apache.hudi.HoodieSparkSqlWriter$.write(HoodieSparkSqlWriter.scala:169){noformat}
 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Mime
View raw message