orc-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From wgtmac <...@git.apache.org>
Subject [GitHub] orc issue #233: ORC-322: [C++] Fix writing & reading timestamp
Date Tue, 20 Mar 2018 03:34:14 GMT
Github user wgtmac commented on the issue:

    https://github.com/apache/orc/pull/233
  
    Thanks @majetideepak for comment!
    
    On the Java side, the input timestamp in writer TimestampColumnVector is in UTC. It leverages
java.sql.Timestamp which knows the local timezone info so that it can PRINT in local timezone.
You can print millis variable in line 109 in TimestampTreeWriter.java to verify this. The
name of SerializationUtils.convertToUtc(localTimezone, millis) in line 113 is kind of confusing,
because the result is not the current timestamp in UTC but adds an offset to local timezone
which I think it is also a problem.
    
    ORC-10 has fixed the bug without writer timezone. The original design is to be resilient
to move between different reader timezones. However this caused an issue in C++ between different
daylight saving timezones and writer timezone is forced to be written. ORC-10 adds GMT offset
is actually converting the value to local timezone so that ColumnPrinter can print the same
time in local timezone. This causes a new problem that C++ reader gets timestamp value in
local timezone, not UTC and it is different from java reader. I believe this is why @owen
has created [ORC-37](https://issues.apache.org/jira/browse/ORC-37). SQL type TimestampTz is
a new type other than traditional SQL type Timestamp, I don't think it is a good idea to mix
ORC timestamp type with TimestampTz and there is another open issue for it: [ORC-189](https://issues.apache.org/jira/browse/ORC-189)
    
    It is very confusing that an input timestamp written using Java writer is read differently
via C++ reader. I think we need to fix it and this can also resolve ORC-37. What do you think?


---

Mime
View raw message