From commits-return-11484-archive-asf-public=cust-asf.ponee.io@hudi.apache.org Thu Feb 13 02:17:02 2020 Return-Path: X-Original-To: archive-asf-public@cust-asf.ponee.io Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [207.244.88.153]) by mx-eu-01.ponee.io (Postfix) with SMTP id 2A183180657 for ; Thu, 13 Feb 2020 03:17:02 +0100 (CET) Received: (qmail 41781 invoked by uid 500); 13 Feb 2020 02:17:01 -0000 Mailing-List: contact commits-help@hudi.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@hudi.apache.org Delivered-To: mailing list commits@hudi.apache.org Received: (qmail 41772 invoked by uid 99); 13 Feb 2020 02:17:01 -0000 Received: from mailrelay1-us-west.apache.org (HELO mailrelay1-us-west.apache.org) (209.188.14.139) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 13 Feb 2020 02:17:01 +0000 Received: from jira-he-de.apache.org (static.172.67.40.188.clients.your-server.de [188.40.67.172]) by mailrelay1-us-west.apache.org (ASF Mail Server at mailrelay1-us-west.apache.org) with ESMTP id AEEDDE2C8D for ; Thu, 13 Feb 2020 02:17:00 +0000 (UTC) Received: from jira-he-de.apache.org (localhost.localdomain [127.0.0.1]) by jira-he-de.apache.org (ASF Mail Server at jira-he-de.apache.org) with ESMTP id 24D6678032D for ; Thu, 13 Feb 2020 02:17:00 +0000 (UTC) Date: Thu, 13 Feb 2020 02:17:00 +0000 (UTC) From: "ASF GitHub Bot (Jira)" To: commits@hudi.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Updated] (HUDI-607) Hive sync fails to register tables partitioned by Date Type column MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/HUDI-607?page=3Dcom.atlassian.= jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HUDI-607: -------------------------------- Labels: pull-request-available (was: ) > Hive sync fails to register tables partitioned by Date Type column > ------------------------------------------------------------------ > > Key: HUDI-607 > URL: https://issues.apache.org/jira/browse/HUDI-607 > Project: Apache Hudi (incubating) > Issue Type: Bug > Components: Hive Integration > Reporter: Udit Mehrotra > Assignee: Udit Mehrotra > Priority: Major > Labels: pull-request-available > > h2. Issue Description > As part of spark to avro conversion, Spark's *Date* type is represented a= s corresponding *Date Logical Type* in Avro, which is underneath represente= d in Avro by physical *Integer* type. For this reason when forming the Avro= records from Spark rows, it is converted to corresponding *Epoch day* to b= e stored as corresponding *Integer* value in the parquet files. > However, this manifests into a problem that when a *Date Type* column is = chosen as partition column. In this case, Hudi's partition column *_hoodie_= partition_path* also gets the corresponding *epoch day integer* value when = reading the partition field from the avro record, and as a result syncing p= artitions in hudi table issues a command like the following, where the date= is an integer: > {noformat} > ALTER TABLE uditme_hudi.uditme_hudi_events_cow_feb05_00 ADD IF NOT EXISTS= PARTITION (event_date=3D'17897') LOCATION 's3://emr-users/uditme/hudi/ta= bles/events/uditme_hudi_events_cow_feb05_00/17897' PARTITION (event_date= =3D'17898') LOCATION 's3://emr-users/uditme/hudi/tables/events/uditme_hudi_= events_cow_feb05_00/17898' PARTITION (event_date=3D'17899') LOCATION 's3:= //emr-users/uditme/hudi/tables/events/uditme_hudi_events_cow_feb05_00/17899= ' PARTITION (event_date=3D'17900') LOCATION 's3://emr-users/uditme/hudi/t= ables/events/uditme_hudi_events_cow_feb05_00/17900'{noformat} > Hive is not able to make sense of the partition field values like *17897*= as it is not able to convert it to corresponding date from this string. It= actually expects the actual date to be represented in string form. > So, we need to make sure that Hudi's partition field gets the actual date= value in string form, instead of the integer. This change makes sure that = when a fields value is retrieved from the Avro record, we check that if its= *Date Logical Type* we return the actual date value, instead of the epoch.= After this change the command for sync partitions issues is like: > {noformat} > ALTER TABLE `uditme_hudi`.`uditme_hudi_events_cow_feb05_01` ADD IF NOT EX= ISTS PARTITION (`event_date`=3D'2019-01-01') LOCATION 's3://emr-users/udi= tme/hudi/tables/events/uditme_hudi_events_cow_feb05_01/2019-01-01' PARTIT= ION (`event_date`=3D'2019-01-02') LOCATION 's3://emr-users/uditme/hudi/tabl= es/events/uditme_hudi_events_cow_feb05_01/2019-01-02' PARTITION (`event_d= ate`=3D'2019-01-03') LOCATION 's3://emr-users/uditme/hudi/tables/events/udi= tme_hudi_events_cow_feb05_01/2019-01-03' PARTITION (`event_date`=3D'2019-= 01-04') LOCATION 's3://emr-users/uditme/hudi/tables/events/uditme_hudi_even= ts_cow_feb05_01/2019-01-04'{noformat} > h2. Stack Trace > {noformat} > 20/01/13 23:28:04 INFO HoodieHiveClient: Last commit time synced is not k= nown, listing all partitions in s3://feichi-test/fact_hourly_search_term_co= nversions/merge_on_read_aws_hudi_jar,FS :com.amazon.ws.emr.hadoop.fs.s3n.S3= NativeFileSystem@1f0c8e1f > 20/01/13 23:28:08 INFO HiveSyncTool: Storage partitions scan complete. Fo= und 31 > 20/01/13 23:28:08 INFO HiveSyncTool: New Partitions [18206, 18207, 18208,= 18209, 18210, 18211, 18212, 18213, 18214, 18215, 18216, 18217, 18218, 1821= 9, 18220, 18221, 18222, 18223, 18224, 18225, 18226, 18227, 18228, 18229, 18= 230, 18231, 18232, 18233, 18234, 18235, 18236] > 20/01/13 23:28:08 INFO HoodieHiveClient: Adding partitions 31 to table fa= ct_hourly_search_term_conversions_hudi_mor_hudi_jar > 20/01/13 23:28:08 INFO HoodieHiveClient: Executing SQL ALTER TABLE defaul= t.fact_hourly_search_term_conversions_hudi_mor_hudi_jar ADD IF NOT EXISTS = PARTITION (dim_date=3D'18206') LOCATION 's3://feichi-test/fact_hourly_sear= ch_term_conversions/merge_on_read_aws_hudi_jar/18206' PARTITION (dim_date= =3D'18207') LOCATION $ > s3://feichi-test/fact_hourly_search_term_conversions/merge_on_read_aws_hu= di_jar/18207' PARTITION (dim_date=3D'18208') LOCATION 's3://feichi-test/f= act_hourly_search_term_conversions/merge_on_read_aws_hudi_jar/18208' PART= ITION (dim_date=3D'18209') LOCATION 's3://feichi-test/fact_hourly_search_te= rm_conversions/merge_$ > n_read_aws_hudi_jar/18209' PARTITION (dim_date=3D'18210') LOCATION 's3:= //feichi-test/fact_hourly_search_term_conversions/merge_on_read_aws_hudi_ja= r/18210' PARTITION (dim_date=3D'18211') LOCATION 's3://feichi-test/fact_h= ourly_search_term_conversions/merge_on_read_aws_hudi_jar/18211' PARTITION= (dim_date=3D'18212') L$ > CATION 's3://feichi-test/fact_hourly_search_term_conversions/merge_on_rea= d_aws_hudi_jar/18212' PARTITION (dim_date=3D'18213') LOCATION 's3://feich= i-test/fact_hourly_search_term_conversions/merge_on_read_aws_hudi_jar/18213= ' PARTITION (dim_date=3D'18214') LOCATION 's3://feichi-test/fact_hourly_s= earch_term_conversion$ > /merge_on_read_aws_hudi_jar/18214' PARTITION (dim_date=3D'18215') LOCAT= ION 's3://feichi-test/fact_hourly_search_term_conversions/merge_on_read_aws= _hudi_jar/18215' PARTITION (dim_date=3D'18216') LOCATION 's3://feichi-tes= t/fact_hourly_search_term_conversions/merge_on_read_aws_hudi_jar/18216' P= ARTITION (dim_date=3D'1$ > 217') LOCATION 's3://feichi-test/fact_hourly_search_term_conversions/merg= e_on_read_aws_hudi_jar/18217' PARTITION (dim_date=3D'18218') LOCATION 's3= ://feichi-test/fact_hourly_search_term_conversions/merge_on_read_aws_hudi_j= ar/18218' PARTITION (dim_date=3D'18219') LOCATION 's3://feichi-test/fact_= hourly_search_term_co$ > versions/merge_on_read_aws_hudi_jar/18219' PARTITION (dim_date=3D'18220= ') LOCATION 's3://feichi-test/fact_hourly_search_term_conversions/merge_on_= read_aws_hudi_jar/18220' PARTITION (dim_date=3D'18221') LOCATION 's3://fe= ichi-test/fact_hourly_search_term_conversions/merge_on_read_aws_hudi_jar/18= 221' PARTITION (dim$ > date=3D'18222') LOCATION 's3://feichi-test/fact_hourly_search_term_conver= sions/merge_on_read_aws_hudi_jar/18222' PARTITION (dim_date=3D'18223') LO= CATION 's3://feichi-test/fact_hourly_search_term_conversions/merge_on_read_= aws_hudi_jar/18223' PARTITION (dim_date=3D'18224') LOCATION 's3://feichi-= test/fact_hourly_search$ > term_conversions/merge_on_read_aws_hudi_jar/18224' PARTITION (dim_date= =3D'18225') LOCATION 's3://feichi-test/fact_hourly_search_term_conversions/= merge_on_read_aws_hudi_jar/18225' PARTITION (dim_date=3D'18226') LOCATION= 's3://feichi-test/fact_hourly_search_term_conversions/merge_on_read_aws_hu= di_jar/18226' PARTIT$ > ON (dim_date=3D'18227') LOCATION 's3://feichi-test/fact_hourly_search_ter= m_conversions/merge_on_read_aws_hudi_jar/18227' PARTITION (dim_date=3D'18= 228') LOCATION 's3://feichi-test/fact_hourly_search_term_conversions/merge_= on_read_aws_hudi_jar/18228' PARTITION (dim_date=3D'18229') LOCATION 's3:/= /feichi-test/fact_hourl$ > _search_term_conversions/merge_on_read_aws_hudi_jar/18229' PARTITION (d= im_date=3D'18230') LOCATION 's3://feichi-test/fact_hourly_search_term_conve= rsions/merge_on_read_aws_hudi_jar/18230' PARTITION (dim_date=3D'18231') L= OCATION 's3://feichi-test/fact_hourly_search_term_conversions/merge_on_read= _aws_hudi_jar/18231' > PARTITION (dim_date=3D'18232') LOCATION 's3://feichi-test/fact_hourly_se= arch_term_conversions/merge_on_read_aws_hudi_jar/18232' PARTITION (dim_da= te=3D'18233') LOCATION 's3://feichi-test/fact_hourly_search_term_conversion= s/merge_on_read_aws_hudi_jar/18233' PARTITION (dim_date=3D'18234') LOCATI= ON 's3://feichi-test/fa$ > t_hourly_search_term_conversions/merge_on_read_aws_hudi_jar/18234' PART= ITION (dim_date=3D'18235') LOCATION 's3://feichi-test/fact_hourly_search_te= rm_conversions/merge_on_read_aws_hudi_jar/18235' PARTITION (dim_date=3D'1= 8236') LOCATION 's3://feichi-test/fact_hourly_search_term_conversions/merge= _on_read_aws_hudi_jar/ > 18236' > org.apache.hudi.hive.HoodieHiveSyncException: Failed to sync partitions f= or table fact_hourly_search_term_conversions_hudi_mor_hudi_jar > at org.apache.hudi.hive.HiveSyncTool.syncPartitions(HiveSyncTool.java:1= 77) > at org.apache.hudi.hive.HiveSyncTool.syncHoodieTable(HiveSyncTool.java:= 107) > at org.apache.hudi.hive.HiveSyncTool.syncHoodieTable(HiveSyncTool.java:= 71) > at org.apache.hudi.HoodieSparkSqlWriter$.syncHive(HoodieSparkSqlWriter.= scala:236) > at org.apache.hudi.HoodieSparkSqlWriter$.write(HoodieSparkSqlWriter.sca= la:169){noformat} > =C2=A0 -- This message was sent by Atlassian Jira (v8.3.4#803005)