hive-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Rui Li (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HIVE-16418) Allow HiveKey to skip some bytes for comparison
Date Fri, 14 Apr 2017 12:18:41 GMT

    [ https://issues.apache.org/jira/browse/HIVE-16418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15968955#comment-15968955
] 

Rui Li commented on HIVE-16418:
-------------------------------

[~xuefuz] has talked about this with me offline. Let me use an example for the discussion.
Suppose the system TZ is GMT+5. Now a user stores a TimestampTZ of '2017-04-14 18:00:00 GMT+8'.
We have the following choices to store it:
# Store as '2017-04-14 18:00:00 GMT+8'. This is my original plan, which I think is closest
to user's expectation - you store some TimestampTZ and when you select, you get the same data
displayed. It fixes both {{to_utc_timestamp}} and {{from_utc_timestamp}}. But this way, we
need to store the TZ part and thus requires all the complexity.
# Store as '2017-04-14 10:00:00 GMT'. This means all TimestampTZ values will display using
the GMT timezone. It's much simpler because we don't have to store the TZ, and we can reuse
most of the code like TimestampWritable. Shortcoming is we discard the TZ info in user's input.
And more importantly, it's difficult to fix the {{from_utc_timestamp}} UDF. This UDF converts
a timestamp in UTC to a user-specified timezone. The return type of course should be TimestampTZ.
But if all TimestampTZ display in UTC, the UDF effectively becomes useless. So in this way,
I guess we have to leave {{from_utc_timestamp}} as is.
# Store as '2017-04-14 15:00:00 GMT+5'. It's similar to #2 but uses the system TZ.

If #1 is unacceptable due to the complexity, I prefer #2. #3 seems to introduce unnecessary
ambiguity - if you run same query on clusters in different TZ, you'll get different results.
What do you guys think?

> Allow HiveKey to skip some bytes for comparison
> -----------------------------------------------
>
>                 Key: HIVE-16418
>                 URL: https://issues.apache.org/jira/browse/HIVE-16418
>             Project: Hive
>          Issue Type: New Feature
>            Reporter: Rui Li
>            Assignee: Rui Li
>         Attachments: HIVE-16418.1.patch
>
>
> The feature is required when we have to serialize some fields and prevent them from being
used in comparison, e.g. HIVE-14412.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Mime
View raw message