hudi-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Vinoth Chandar (Jira)" <j...@apache.org>
Subject [jira] [Closed] (HUDI-607) Hive sync fails to register tables partitioned by Date Type column
Date Mon, 09 Mar 2020 16:49:00 GMT

     [ https://issues.apache.org/jira/browse/HUDI-607?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Vinoth Chandar closed HUDI-607.
-------------------------------
    Resolution: Fixed

> Hive sync fails to register tables partitioned by Date Type column
> ------------------------------------------------------------------
>
>                 Key: HUDI-607
>                 URL: https://issues.apache.org/jira/browse/HUDI-607
>             Project: Apache Hudi (incubating)
>          Issue Type: Bug
>          Components: Hive Integration
>            Reporter: Udit Mehrotra
>            Assignee: Udit Mehrotra
>            Priority: Major
>              Labels: pull-request-available
>             Fix For: 0.6.0
>
>          Time Spent: 20m
>  Remaining Estimate: 0h
>
> h2. Issue Description
> As part of spark to avro conversion, Spark's *Date* type is represented as corresponding
*Date Logical Type* in Avro, which is underneath represented in Avro by physical *Integer*
type. For this reason when forming the Avro records from Spark rows, it is converted to corresponding
*Epoch day* to be stored as corresponding *Integer* value in the parquet files.
> However, this manifests into a problem that when a *Date Type* column is chosen as partition
column. In this case, Hudi's partition column *_hoodie_partition_path* also gets the corresponding
*epoch day integer* value when reading the partition field from the avro record, and as a
result syncing partitions in hudi table issues a command like the following, where the date
is an integer:
> {noformat}
> ALTER TABLE uditme_hudi.uditme_hudi_events_cow_feb05_00 ADD IF NOT EXISTS   PARTITION
(event_date='17897') LOCATION 's3://emr-users/uditme/hudi/tables/events/uditme_hudi_events_cow_feb05_00/17897'
  PARTITION (event_date='17898') LOCATION 's3://emr-users/uditme/hudi/tables/events/uditme_hudi_events_cow_feb05_00/17898'
  PARTITION (event_date='17899') LOCATION 's3://emr-users/uditme/hudi/tables/events/uditme_hudi_events_cow_feb05_00/17899'
  PARTITION (event_date='17900') LOCATION 's3://emr-users/uditme/hudi/tables/events/uditme_hudi_events_cow_feb05_00/17900'{noformat}
> Hive is not able to make sense of the partition field values like *17897* as it is not
able to convert it to corresponding date from this string. It actually expects the actual
date to be represented in string form.
> So, we need to make sure that Hudi's partition field gets the actual date value in string
form, instead of the integer. This change makes sure that when a fields value is retrieved
from the Avro record, we check that if its *Date Logical Type* we return the actual date value,
instead of the epoch. After this change the command for sync partitions issues is like:
> {noformat}
> ALTER TABLE `uditme_hudi`.`uditme_hudi_events_cow_feb05_01` ADD IF NOT EXISTS   PARTITION
(`event_date`='2019-01-01') LOCATION 's3://emr-users/uditme/hudi/tables/events/uditme_hudi_events_cow_feb05_01/2019-01-01'
  PARTITION (`event_date`='2019-01-02') LOCATION 's3://emr-users/uditme/hudi/tables/events/uditme_hudi_events_cow_feb05_01/2019-01-02'
  PARTITION (`event_date`='2019-01-03') LOCATION 's3://emr-users/uditme/hudi/tables/events/uditme_hudi_events_cow_feb05_01/2019-01-03'
  PARTITION (`event_date`='2019-01-04') LOCATION 's3://emr-users/uditme/hudi/tables/events/uditme_hudi_events_cow_feb05_01/2019-01-04'{noformat}
> h2. Stack Trace
> {noformat}
> 20/01/13 23:28:04 INFO HoodieHiveClient: Last commit time synced is not known, listing
all partitions in s3://feichi-test/fact_hourly_search_term_conversions/merge_on_read_aws_hudi_jar,FS
:com.amazon.ws.emr.hadoop.fs.s3n.S3NativeFileSystem@1f0c8e1f
> 20/01/13 23:28:08 INFO HiveSyncTool: Storage partitions scan complete. Found 31
> 20/01/13 23:28:08 INFO HiveSyncTool: New Partitions [18206, 18207, 18208, 18209, 18210,
18211, 18212, 18213, 18214, 18215, 18216, 18217, 18218, 18219, 18220, 18221, 18222, 18223,
18224, 18225, 18226, 18227, 18228, 18229, 18230, 18231, 18232, 18233, 18234, 18235, 18236]
> 20/01/13 23:28:08 INFO HoodieHiveClient: Adding partitions 31 to table fact_hourly_search_term_conversions_hudi_mor_hudi_jar
> 20/01/13 23:28:08 INFO HoodieHiveClient: Executing SQL ALTER TABLE default.fact_hourly_search_term_conversions_hudi_mor_hudi_jar
ADD IF NOT EXISTS   PARTITION (dim_date='18206') LOCATION 's3://feichi-test/fact_hourly_search_term_conversions/merge_on_read_aws_hudi_jar/18206'
  PARTITION (dim_date='18207') LOCATION $
> s3://feichi-test/fact_hourly_search_term_conversions/merge_on_read_aws_hudi_jar/18207'
  PARTITION (dim_date='18208') LOCATION 's3://feichi-test/fact_hourly_search_term_conversions/merge_on_read_aws_hudi_jar/18208'
  PARTITION (dim_date='18209') LOCATION 's3://feichi-test/fact_hourly_search_term_conversions/merge_$
> n_read_aws_hudi_jar/18209'   PARTITION (dim_date='18210') LOCATION 's3://feichi-test/fact_hourly_search_term_conversions/merge_on_read_aws_hudi_jar/18210'
  PARTITION (dim_date='18211') LOCATION 's3://feichi-test/fact_hourly_search_term_conversions/merge_on_read_aws_hudi_jar/18211'
  PARTITION (dim_date='18212') L$
> CATION 's3://feichi-test/fact_hourly_search_term_conversions/merge_on_read_aws_hudi_jar/18212'
  PARTITION (dim_date='18213') LOCATION 's3://feichi-test/fact_hourly_search_term_conversions/merge_on_read_aws_hudi_jar/18213'
  PARTITION (dim_date='18214') LOCATION 's3://feichi-test/fact_hourly_search_term_conversion$
> /merge_on_read_aws_hudi_jar/18214'   PARTITION (dim_date='18215') LOCATION 's3://feichi-test/fact_hourly_search_term_conversions/merge_on_read_aws_hudi_jar/18215'
  PARTITION (dim_date='18216') LOCATION 's3://feichi-test/fact_hourly_search_term_conversions/merge_on_read_aws_hudi_jar/18216'
  PARTITION (dim_date='1$
> 217') LOCATION 's3://feichi-test/fact_hourly_search_term_conversions/merge_on_read_aws_hudi_jar/18217'
  PARTITION (dim_date='18218') LOCATION 's3://feichi-test/fact_hourly_search_term_conversions/merge_on_read_aws_hudi_jar/18218'
  PARTITION (dim_date='18219') LOCATION 's3://feichi-test/fact_hourly_search_term_co$
> versions/merge_on_read_aws_hudi_jar/18219'   PARTITION (dim_date='18220') LOCATION 's3://feichi-test/fact_hourly_search_term_conversions/merge_on_read_aws_hudi_jar/18220'
  PARTITION (dim_date='18221') LOCATION 's3://feichi-test/fact_hourly_search_term_conversions/merge_on_read_aws_hudi_jar/18221'
  PARTITION (dim$
> date='18222') LOCATION 's3://feichi-test/fact_hourly_search_term_conversions/merge_on_read_aws_hudi_jar/18222'
  PARTITION (dim_date='18223') LOCATION 's3://feichi-test/fact_hourly_search_term_conversions/merge_on_read_aws_hudi_jar/18223'
  PARTITION (dim_date='18224') LOCATION 's3://feichi-test/fact_hourly_search$
> term_conversions/merge_on_read_aws_hudi_jar/18224'   PARTITION (dim_date='18225') LOCATION
's3://feichi-test/fact_hourly_search_term_conversions/merge_on_read_aws_hudi_jar/18225'  
PARTITION (dim_date='18226') LOCATION 's3://feichi-test/fact_hourly_search_term_conversions/merge_on_read_aws_hudi_jar/18226'
  PARTIT$
> ON (dim_date='18227') LOCATION 's3://feichi-test/fact_hourly_search_term_conversions/merge_on_read_aws_hudi_jar/18227'
  PARTITION (dim_date='18228') LOCATION 's3://feichi-test/fact_hourly_search_term_conversions/merge_on_read_aws_hudi_jar/18228'
  PARTITION (dim_date='18229') LOCATION 's3://feichi-test/fact_hourl$
> _search_term_conversions/merge_on_read_aws_hudi_jar/18229'   PARTITION (dim_date='18230')
LOCATION 's3://feichi-test/fact_hourly_search_term_conversions/merge_on_read_aws_hudi_jar/18230'
  PARTITION (dim_date='18231') LOCATION 's3://feichi-test/fact_hourly_search_term_conversions/merge_on_read_aws_hudi_jar/18231'
>  PARTITION (dim_date='18232') LOCATION 's3://feichi-test/fact_hourly_search_term_conversions/merge_on_read_aws_hudi_jar/18232'
  PARTITION (dim_date='18233') LOCATION 's3://feichi-test/fact_hourly_search_term_conversions/merge_on_read_aws_hudi_jar/18233'
  PARTITION (dim_date='18234') LOCATION 's3://feichi-test/fa$
> t_hourly_search_term_conversions/merge_on_read_aws_hudi_jar/18234'   PARTITION (dim_date='18235')
LOCATION 's3://feichi-test/fact_hourly_search_term_conversions/merge_on_read_aws_hudi_jar/18235'
  PARTITION (dim_date='18236') LOCATION 's3://feichi-test/fact_hourly_search_term_conversions/merge_on_read_aws_hudi_jar/
> 18236'
> org.apache.hudi.hive.HoodieHiveSyncException: Failed to sync partitions for table fact_hourly_search_term_conversions_hudi_mor_hudi_jar
>   at org.apache.hudi.hive.HiveSyncTool.syncPartitions(HiveSyncTool.java:177)
>   at org.apache.hudi.hive.HiveSyncTool.syncHoodieTable(HiveSyncTool.java:107)
>   at org.apache.hudi.hive.HiveSyncTool.syncHoodieTable(HiveSyncTool.java:71)
>   at org.apache.hudi.HoodieSparkSqlWriter$.syncHive(HoodieSparkSqlWriter.scala:236)
>   at org.apache.hudi.HoodieSparkSqlWriter$.write(HoodieSparkSqlWriter.scala:169){noformat}
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Mime
View raw message