hive-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Haozhun Jin (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (HIVE-12192) Hive should carry out timestamp computations in UTC
Date Wed, 04 Apr 2018 23:03:00 GMT

    [ https://issues.apache.org/jira/browse/HIVE-12192?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16426302#comment-16426302
] 

Haozhun Jin edited comment on HIVE-12192 at 4/4/18 11:02 PM:
-------------------------------------------------------------

[~jcamachorodriguez], thank you for your patient answer. Please bear with us for a little
bit more.

What do you think after reading the two experiments below?
h2. Experiment 1

I conducted this experiment once before. And this is why I'm under the impression that Hive
Timestamp type means Instant. I redid the experiment today in both 1.2.1 and 2.6.3. The result
is the same. Zone is America/Los_Angeles.

The table below summarizes the outcome: [raw|https://gist.github.com/haozhun/03cd09b3fa2456271f2e01759c9c1b8e]

 
||Query||Actual||expected LocalDateTime||expect Instant||
|current_timestamp()|2018-04-04 14:51|2018-04-04 14:51|2018-04-04 14:51|
|current_timestamp()
- interval '2880' hour|2017-12-05 13:51|2018-12-05 14:51|2017-12-05 13:51|
h2. Experiment 2

After reading your comments about ORC, I conducted some experiments to see how other file
formats behave. The result is not what I was expecting. I did the experiment in 2.6.3.

The table below summarizes the outcome: [raw|[https://gist.github.com/haozhun/2e4b7c52bf3c56c03ed06e1ad895e198]|https://gist.github.com/haozhun/2e4b7c52bf3c56c03ed06e1ad895e198])]

 
||(server +05:45)||ORC||RCBinary||RCText||Text||
|Insert (client -07:00)|06:00:00|06:00:00|06:00:00|06:00:00|
|Read 1 (client -07:00)|18:45:00|06:00:00|18:45:00|18:45:00|
|Read 2 (client -04:00)|18:45:00|09:00:00|18:45:00|18:45:00|
h2. Summary

This is my take away
 * experiment 1: Hive timestamp type has Instant semantics. If it's internal representation
is changed from java.sql.Timestamp to java.time.LocalDateTime, it will be a user-visible behavior
change.
 * experiment 2: Do not use Hive in a zone different from the server's (given insert and read
does not round trip in a single hive cli session). Hopefully, that's indeed how every one
uses Hive. In that case, it does not matter whether experiment 2 indicates Instant or LocalDateTime.

 


was (Author: haozhun):
[~jcamachorodriguez], thank you for your patient answer. Please bear with us for a little
bit more.

What do you think after reading the two experiments below?
h2. Experiment 1

I conducted this experiment once before. And this is why I'm under the impression that Hive
Timestamp type means Instant. I redid the experiment today in both 1.2.1 and 2.6.3. The result
is the same. Zone is America/Los_Angeles.

The table below summarizes the outcome: [raw|https://gist.github.com/haozhun/03cd09b3fa2456271f2e01759c9c1b8e]

 
||Query||Actual||expected LocalDateTime||expect Instant||
|current_timestamp()|2018-04-04 14:51|2018-04-04 14:51|2018-04-04 14:51|
|current_timestamp() - interval '2880' hour|2017-12-05 13:51|2018-12-05 14:51|2017-12-05 13:51|
h2. Experiment 2

After reading your comments about ORC, I conducted some experiments to see how other file
formats behave. The result is not what I was expecting. I did the experiment in 2.6.3.

The table below summarizes the outcome: [raw|[https://gist.github.com/haozhun/2e4b7c52bf3c56c03ed06e1ad895e198]|https://gist.github.com/haozhun/2e4b7c52bf3c56c03ed06e1ad895e198])]

 
||(server +05:45)||ORC||RCBinary||RCText||Text||
|Insert (client -07:00)|06:00:00|06:00:00|06:00:00|06:00:00|
|Read 1 (client -07:00)|18:45:00|06:00:00|18:45:00|18:45:00|
|Read 2 (client -04:00)|18:45:00|09:00:00|18:45:00|18:45:00|
h2. Summary

This is my take away
 * experiment 1: Hive timestamp type has Instant semantics. If it's internal representation
is changed from java.sql.Timestamp to java.time.LocalDateTime, it will be a user-visible behavior
change.
 * experiment 2: Do not use Hive in a zone different from the server's (given insert and read
does not round trip in a single hive cli session). Hopefully, that's indeed how every one
uses Hive. In that case, it does not matter whether experiment 2 indicates Instant or LocalDateTime.

 

> Hive should carry out timestamp computations in UTC
> ---------------------------------------------------
>
>                 Key: HIVE-12192
>                 URL: https://issues.apache.org/jira/browse/HIVE-12192
>             Project: Hive
>          Issue Type: Sub-task
>          Components: Hive
>            Reporter: Ryan Blue
>            Assignee: Jesus Camacho Rodriguez
>            Priority: Major
>              Labels: timestamp
>         Attachments: HIVE-12192.patch
>
>
> Hive currently uses the "local" time of a java.sql.Timestamp to represent the SQL data
type TIMESTAMP WITHOUT TIME ZONE. The purpose is to be able to use {{Timestamp#getYear()}}
and similar methods to implement SQL functions like {{year}}.
> When the SQL session's time zone is a DST zone, such as America/Los_Angeles that alternates
between PST and PDT, there are times that cannot be represented because the effective zone
skips them.
> {code}
> hive> select TIMESTAMP '2015-03-08 02:10:00.101';
> 2015-03-08 03:10:00.101
> {code}
> Using UTC instead of the SQL session time zone as the underlying zone for a java.sql.Timestamp
avoids this bug, while still returning correct values for {{getYear}} etc. Using UTC as the
convenience representation (timestamp without time zone has no real zone) would make timestamp
calculations more consistent and avoid similar problems in the future.
> Notably, this would break the {{unix_timestamp}} UDF that specifies the result is with
respect to ["the default timezone and default locale"|https://cwiki.apache.org/confluence/display/Hive/LanguageManual+UDF#LanguageManualUDF-DateFunctions].
That function would need to be updated to use the {{System.getProperty("user.timezone")}}
zone.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Mime
View raw message