From issues-return-156521-archive-asf-public=cust-asf.ponee.io@hive.apache.org Fri Apr 26 00:54:02 2019 Return-Path: X-Original-To: archive-asf-public@cust-asf.ponee.io Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [207.244.88.153]) by mx-eu-01.ponee.io (Postfix) with SMTP id AF3A6180638 for ; Fri, 26 Apr 2019 02:54:01 +0200 (CEST) Received: (qmail 63769 invoked by uid 500); 26 Apr 2019 00:54:01 -0000 Mailing-List: contact issues-help@hive.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@hive.apache.org Delivered-To: mailing list issues@hive.apache.org Received: (qmail 63760 invoked by uid 99); 26 Apr 2019 00:54:01 -0000 Received: from mailrelay1-us-west.apache.org (HELO mailrelay1-us-west.apache.org) (209.188.14.139) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 26 Apr 2019 00:54:01 +0000 Received: from jira-lw-us.apache.org (unknown [207.244.88.139]) by mailrelay1-us-west.apache.org (ASF Mail Server at mailrelay1-us-west.apache.org) with ESMTP id 6B4B5E0095 for ; Fri, 26 Apr 2019 00:54:00 +0000 (UTC) Received: from jira-lw-us.apache.org (localhost [127.0.0.1]) by jira-lw-us.apache.org (ASF Mail Server at jira-lw-us.apache.org) with ESMTP id 1F81C25811 for ; Fri, 26 Apr 2019 00:54:00 +0000 (UTC) Date: Fri, 26 Apr 2019 00:54:00 +0000 (UTC) From: "Jesus Camacho Rodriguez (JIRA)" To: issues@hive.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Updated] (HIVE-21291) Restore historical way of handling timestamps in Avro while keeping the new semantics at the same time MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/HIVE-21291?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jesus Camacho Rodriguez updated HIVE-21291: ------------------------------------------- Resolution: Fixed Fix Version/s: 4.0.0 Status: Resolved (was: Patch Available) Pushed to master, thanks [~klcopp]. Would you mind to rebase on top of branch-3 and branch-3.1 so we can backport to those branches too? > Restore historical way of handling timestamps in Avro while keeping the new semantics at the same time > ------------------------------------------------------------------------------------------------------ > > Key: HIVE-21291 > URL: https://issues.apache.org/jira/browse/HIVE-21291 > Project: Hive > Issue Type: Sub-task > Reporter: Zoltan Ivanfi > Assignee: Karen Coppage > Priority: Major > Fix For: 4.0.0 > > Attachments: HIVE-21291.1.patch, HIVE-21291.2.patch, HIVE-21291.3.patch, HIVE-21291.4.patch, HIVE-21291.4.patch, HIVE-21291.5.patch, HIVE-21291.6.patch, HIVE-21291.7.patch, HIVE-21291.7.patch > > > This sub-task is for implementing the Avro-specific parts of the following plan: > h1. Problem > Historically, the semantics of the TIMESTAMP type in Hive depended on the file format. Timestamps in Avro, Parquet and RCFiles with a binary SerDe had _Instant_ semantics, while timestamps in ORC, textfiles and RCFiles with a text SerDe had _LocalDateTime_ semantics. > The Hive community wanted to get rid of this inconsistency and have _LocalDateTime_ semantics in Avro, Parquet and RCFiles with a binary SerDe as well. *Hive 3.1 turned off normalization to UTC* to achieve this. While this leads to the desired new semantics, it also leads to incorrect results when new Hive versions read timestamps written by old Hive versions or when old Hive versions or any other component not aware of this change (including legacy Impala and Spark versions) read timestamps written by new Hive versions. > h1. Solution > To work around this issue, Hive *should restore the practice of normalizing to UTC* when writing timestamps to Avro, Parquet and RCFiles with a binary SerDe. In itself, this would restore the historical _Instant_ semantics, which is undesirable. In order to achieve the desired _LocalDateTime_ semantics in spite of normalizing to UTC, newer Hive versions should record the session-local local time zone in the file metadata fields serving arbitrary key-value storage purposes. > When reading back files with this time zone metadata, newer Hive versions (or any other new component aware of this extra metadata) can achieve _LocalDateTime_ semantics by *converting from UTC to the saved time zone (instead of to the local time zone)*. Legacy components that are unaware of the new metadata can read the files without any problem and the timestamps will show the historical Instant behaviour to them. -- This message was sent by Atlassian JIRA (v7.6.3#76005)