orc-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From stiga-huang <...@git.apache.org>
Subject [GitHub] orc issue #233: ORC-322: [C++] Fix writing & reading timestamp
Date Thu, 22 Mar 2018 01:02:10 GMT
Github user stiga-huang commented on the issue:

    https://github.com/apache/orc/pull/233
  
    @omalley @majetideepak @wgtmac Thanks for your follow up on ORC-322! If I understand these
correctly, the convention is that TimestampColumnVector should only accept timestamps in local
time. Timestamp values stored in ORC file are `local_timestamp - local_orc_epoch`. TimestampColumnVector
got from the java reader has timestamps in local time. However, TimestampColumnVector got
from the c++ reader has UTC timestamps.
    
    If so, the c++ writer doesn't need to minus gmtOffset for each timestamp, because after
shifting the values in ORC file are `utc_timestamp - local_orc_epoch`.
    
    If not, I think the bug in ORC-320 should still be fixed (ORC-322 is aimed to fix ORC-320).
The root cause of ORC-320 is that gmtOffsets got in writer and reader can be different, though
they're using the same Timezone.
    
    To be specific, the writer gets gmtOffset by timestamp `ts`, then writes down `ts - gmtOffset`
(Let's ignore the orc epoch since it's the same in writer and reader). The reader use `ts
- gmtOffset` to get gmtOffset2, then read out `ts - gmtOffset + gmtOffset2`. However, `gmtOffset2`
may not equal to `gmtOffset`.
    
    Thanks for your patience reading this long comment!


---

Mime
View raw message