hudi-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Cosmin Iordache (Jira)" <j...@apache.org>
Subject [jira] [Commented] (HUDI-83) Support for timestamp datatype in Hudi
Date Fri, 13 Mar 2020 11:05:00 GMT

    [ https://issues.apache.org/jira/browse/HUDI-83?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17058635#comment-17058635
] 

Cosmin Iordache commented on HUDI-83:
-------------------------------------

I was looking at how hudi saves data with spark-2.4.4 and things have changed. 

Decimal types are saved correctly ,timestamp as well. 

Example of timestamp inferred column being read after saved with hoodie : 
{code:java}
scala> val q3 = spark.read.format("org.apache.hudi").load("hdfs://namenode:8020/data/lake/d3325f10-4a91-4b19-872b-5be019c4836a/converted/*/*")
5651992 [main] WARN  org.apache.hudi.DefaultSource  - Snapshot view not supported yet via
data source, for MERGE_ON_READ tables. Please query the Hive table registered using Spark
SQL.
q3: org.apache.spark.sql.DataFrame = [_hoodie_commit_time: string, _hoodie_commit_seqno: string
... 13 more fields]scala> q3.show()

scala> q3.select("other_date","timestamp_1").show
+-------------------+-------------------+
|         other_date|        timestamp_1|
+-------------------+-------------------+
|2017-09-17 00:00:00|2017-01-01 00:00:00|
|2017-09-16 00:00:00|2017-01-01 00:00:00|
+-------------------+-------------------+
scala> q3.select("other_date","timestamp_1").dtypes
res6: Array[(String, String)] = Array((other_date,TimestampType), (timestamp_1,TimestampType))

{code}
 And for Decimal:
{code:java}
scala> val q2 = spark.read.format("org.apache.hudi").load("hdfs://namenode:8020/data/lake/5a3d9896-b331-4b5d-8638-5d72e02edd34/converted/*/*")
6221463 [main] WARN  org.apache.hudi.DefaultSource  - Snapshot view not supported yet via
data source, for MERGE_ON_READ tables. Please query the Hive table registered using Spark
SQL.
q2: org.apache.spark.sql.DataFrame = [_hoodie_commit_time: string, _hoodie_commit_seqno: string
... 40 more fields]
scala> q2.select("LIMIT_BAL").show()
+---------+
|LIMIT_BAL|
+---------+
|   260000|
|   110000|
|    50000|
....
scala> q2.select("LIMIT_BAL").dtypes
res10: Array[(String, String)] = Array((LIMIT_BAL,DecimalType(6,0)))
{code}
This introduces a backwards compatibility issue though . 

> Support for timestamp datatype in Hudi
> --------------------------------------
>
>                 Key: HUDI-83
>                 URL: https://issues.apache.org/jira/browse/HUDI-83
>             Project: Apache Hudi (incubating)
>          Issue Type: Bug
>          Components: Usability
>            Reporter: Vinoth Chandar
>            Priority: Major
>             Fix For: 0.6.0
>
>
> [https://github.com/apache/incubator-hudi/issues/543] & related issues 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Mime
View raw message