hive-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Bryan Cutler (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HIVE-19723) Arrow serde: "Unsupported data type: Timestamp(NANOSECOND, null)"
Date Mon, 04 Jun 2018 21:47:00 GMT

    [ https://issues.apache.org/jira/browse/HIVE-19723?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16500903#comment-16500903
] 

Bryan Cutler commented on HIVE-19723:
-------------------------------------

> My understanding is that since the primary use-case for ArrowUtils is Python integration,
some of the conversions are currently somewhat particular for Python. Perhaps Python/Pandas
only supports MICROSECOND timestamps. 

Python, with pandas and pyarrow, supports timestamps down to nanoseconds.  The reason for
for using microseconds in Spark {{ArrowUtils}} is to match Sparks internal representation,
which is in microseconds.  This way avoids any further conversions once read into the Spark
JVM.

> Arrow serde: "Unsupported data type: Timestamp(NANOSECOND, null)"
> -----------------------------------------------------------------
>
>                 Key: HIVE-19723
>                 URL: https://issues.apache.org/jira/browse/HIVE-19723
>             Project: Hive
>          Issue Type: Bug
>            Reporter: Teddy Choi
>            Assignee: Teddy Choi
>            Priority: Major
>              Labels: pull-request-available
>             Fix For: 3.1.0, 4.0.0
>
>         Attachments: HIVE-19723.1.patch, HIVE-19732.2.patch
>
>
> Spark's Arrow support only provides Timestamp at MICROSECOND granularity. Spark 2.3.0
won't accept NANOSECOND. Switch it back to MICROSECOND.
> The unit test org.apache.hive.jdbc.TestJdbcWithMiniLlapArrow will just need to change
the assertion to test microsecond. And we'll need to add this to documentation on supported
datatypes.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Mime
View raw message