From dev-return-1950-archive-asf-public=cust-asf.ponee.io@orc.apache.org Thu Mar 22 02:02:13 2018 Return-Path: X-Original-To: archive-asf-public@cust-asf.ponee.io Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by mx-eu-01.ponee.io (Postfix) with SMTP id DB1B6180651 for ; Thu, 22 Mar 2018 02:02:12 +0100 (CET) Received: (qmail 72314 invoked by uid 500); 22 Mar 2018 01:02:11 -0000 Mailing-List: contact dev-help@orc.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@orc.apache.org Delivered-To: mailing list dev@orc.apache.org Received: (qmail 72293 invoked by uid 99); 22 Mar 2018 01:02:11 -0000 Received: from git1-us-west.apache.org (HELO git1-us-west.apache.org) (140.211.11.23) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 22 Mar 2018 01:02:11 +0000 Received: by git1-us-west.apache.org (ASF Mail Server at git1-us-west.apache.org, from userid 33) id C01CCF4E28; Thu, 22 Mar 2018 01:02:10 +0000 (UTC) From: stiga-huang To: dev@orc.apache.org Reply-To: dev@orc.apache.org References: In-Reply-To: Subject: [GitHub] orc issue #233: ORC-322: [C++] Fix writing & reading timestamp Content-Type: text/plain Message-Id: <20180322010210.C01CCF4E28@git1-us-west.apache.org> Date: Thu, 22 Mar 2018 01:02:10 +0000 (UTC) Github user stiga-huang commented on the issue: https://github.com/apache/orc/pull/233 @omalley @majetideepak @wgtmac Thanks for your follow up on ORC-322! If I understand these correctly, the convention is that TimestampColumnVector should only accept timestamps in local time. Timestamp values stored in ORC file are `local_timestamp - local_orc_epoch`. TimestampColumnVector got from the java reader has timestamps in local time. However, TimestampColumnVector got from the c++ reader has UTC timestamps. If so, the c++ writer doesn't need to minus gmtOffset for each timestamp, because after shifting the values in ORC file are `utc_timestamp - local_orc_epoch`. If not, I think the bug in ORC-320 should still be fixed (ORC-322 is aimed to fix ORC-320). The root cause of ORC-320 is that gmtOffsets got in writer and reader can be different, though they're using the same Timezone. To be specific, the writer gets gmtOffset by timestamp `ts`, then writes down `ts - gmtOffset` (Let's ignore the orc epoch since it's the same in writer and reader). The reader use `ts - gmtOffset` to get gmtOffset2, then read out `ts - gmtOffset + gmtOffset2`. However, `gmtOffset2` may not equal to `gmtOffset`. Thanks for your patience reading this long comment! ---