drill-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Paul Rogers (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (DRILL-5360) Timestamp type documented as UTC, implemented as local time
Date Thu, 16 Mar 2017 23:19:41 GMT

    [ https://issues.apache.org/jira/browse/DRILL-5360?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15929121#comment-15929121
] 

Paul Rogers edited comment on DRILL-5360 at 3/16/17 11:19 PM:
--------------------------------------------------------------

More experiments. Turns out Drill *tries* to follow the JDBC standard and return a {{Timestamp}}
as UTC. Here is how a {{NOW()}} query seems to work:

* Drill queries the system to get current time, UTC.
* Drill converts UTC to local time and stores it in the {{Timestamp}} vector.
* Drill sends the server-local time to the JDBC client.
* The JDBC client uses the client's local time zone to convert the {{Timestamp}} to UTC.
* JDBC converts the (quasi) UTC timestamp to a {{Timestamp}} object.

This works wonderfully if the client and server are in the same time zone. But, if they are
in different time zones, the answer will be wrong. Consider a server in EST, a client in PST.
Consider a time in UTC of 2017-03-10T01:00:00.

* Drill retrieves 2017-03-10T01:00:00 UTC from the system.
* Drill coverts the time to EST (UTC-5) or 2017-03-09T20:00:00 EST.
* Drill sends the EST time to the PST client.
* The PST client converts the time to (quasi) UTC using the PST offset: UTC-8: 2017-03-10T04:00:00
UTC.

Notice that the "UTC" time on the client (04:00:00) is *not* the same as the true UTC on the
server: 01:00:00.

This is a problem any time the client and server are in different time zones.

Lesson: Drill only supports Timestamp when the client and server are in the same time zone
(or configured to use the same time zones.)

Also, the documentation is very misleading. The reality:

* Literals are in UTC.
* Internal values are in server local time.
* JDBC values are in UTC + (client offset - server offset), but we call this UTC.


was (Author: paul-rogers):
More experiments. Turns out Drill *tries* to follow the JDBC standard and return a {{Timestamp}}
as UTC. Here is how a {{NOW()}} query seems to work:

* Drill queries the system to get current time, UTC.
* Drill converts UTC to local time and stores it in the {{Timestamp}} vector.
* Drill sends the server-local time to the JDBC client.
* The JDBC client uses the client's local time zone to convert the {{Timestamp}} to UTC.
* JDBC converts the (quasi) UTC timestamp to a {{Timestamp}} object.

This works wonderfully if the client and server are in the same time zone. But, if they are
in different time zones, the answer will be wrong. Consider a server in EST, a client in PST.
Consider a time in UTC of 2017-03-10T01:00:00.

* Drill retrieves 2017-03-10T01:00:00 UTC from the system.
* Drill coverts the time to EST (UTC-5) or 2017-03-09T20:00:00 EST.
* Drill sends the EST time to the PST client.
* The PST client converts the time to (quasi) UTC using the PST offset: UTC-8: 2017-03-10T04:00:00
UTC.

Notice that the "UTC" time on the client (04:00:00) is *not* the same as the true UTC on the
server: 01:00:00.

This is a problem any time the client and server are in different time zones.

Lesson: Drill only supports Timestamp when the client and server are in the same time zone
(or configured to use the same time zones.)

Also, the documentation is very misleading.

> Timestamp type documented as UTC, implemented as local time
> -----------------------------------------------------------
>
>                 Key: DRILL-5360
>                 URL: https://issues.apache.org/jira/browse/DRILL-5360
>             Project: Apache Drill
>          Issue Type: Bug
>    Affects Versions: 1.10.0
>            Reporter: Paul Rogers
>
> The Drill documentation implies that the {{Timestamp}} type is in UTC:
> bq. JDBC timestamp in year, month, date hour, minute, second, and optional milliseconds
format: yyyy-MM-dd HH:mm:ss.SSS. ... TIMESTAMP literals: Drill stores values in Coordinated
Universal Time (UTC). Drill supports time functions in the range 1971 to 2037. ... Drill does
not support TIMESTAMP with time zone.
> The above is ambiguous. The first part talks about JDBC timestamps. From the JDK Javadoc:
> bq. Timestamp: A thin wrapper around java.util.Date. ... Date class is intended to reflect
coordinated universal time (UTC)...
> So, a JDBC timestamp is intended to represent time in UTC. (The "indented to reflect"
statement leaves open the possibility of misusing {{Date}} to represent times in other time
zones. This was common practice in early Java development and was the reason for the eventual
development of the Joda, then Java 8 date/time classes.)
> The Drill documentation implies that timestamp *literals* are in UTC, but a careful read
of the documentation does allow an interpretation that the internal representation can be
other than UTC. If this is true, then we would also rely on a liberal reading of the Java
`Timestamp` class to also not be UTC. (Or, we rely on the Drill JDBC driver to convert from
the (unknown) server time zone to a UTC value returned by the Drill JDBC client.)
> Still, a superficial reading (and common practice) would suggest that a Drill Timestamp
should be in UTC.
> However, a test on a Mac, with an embedded Drillbit (run in the Pacific time zone, with
Daylight Savings Time in effect) shows that the Timestamp binary value is actual local time:
> {code}
>       long before = System.currentTimeMillis();
>       long value = getDateValue(client, "SELECT NOW() FROM (VALUES(1))" );
>       double hrsDiff = (value - before) / (1000.00 * 60 * 60);
>       System.out.println("Hours: " + hrsDiff);
> {code}
> The above gets the actual UTC time from Java. Then, it runs a query that gets Drill's
idea of the current time using the {{NOW()}} function. (The {{getDateValue}} function uses
the new test framework to access the actual {{long}} value from the returned value vector.)
Finally, we compute the difference between the two times, converted to hours. Output:
> {code}
> Hours: -6.9999975
> {code}
> As it turns out, this is the difference between UTC and PDT. So, the time is in local
time, not UTC.
> Since the documentation and implementation are both ambiguous, it is hard to know the
intent of the Drill Timestamp. Clearly, common practice is to use UTC. But, there is wiggle-room.
> If the Timestamp value is supposed to be local time, then Drill should provide a function
to return the server's time zone offset (in ms) from UTC so that the client can to the needed
local-to-UTC conversion to get a true timestamp.
> On the other hand, if the Timestamp is supposed to be UTC (per common practice), then
{{NOW()}} should not report local time, it should return UTC.
> Further, if {{NOW()}} returns local time, but Timestamp literals are UTC, then it is
hard to see how any query can be rationally written if one timestamp value is local, but a
literal is UTC.
> So, job #1 is to define the Timestamp semantics. Then, use that to figure out where the
bug lies to make implementation consistent with documentation (or visa-versa.)



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Mime
View raw message