hive-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ashutosh Chauhan (JIRA)" <>
Subject [jira] [Commented] (HIVE-16418) Allow HiveKey to skip some bytes for comparison
Date Mon, 17 Apr 2017 23:51:42 GMT


Ashutosh Chauhan commented on HIVE-16418:

We need to think about storage type for Timestamp in different stages of query processing:

* On-disk format : Whether to store TZ or not. Primary concern is fidelity of original data
and secondary concern is storage efficiency.
* In-memory format : On which computations are performed. As I see it, our current Timestamp
choice here is inappropriate. Issue is java.sql.Timestamp (which implicitly assumes local
Timezone) doesnt correspond to either sql Timestamp (which is essentially zoneless ) or Timestamp
with Timezone (which has zone, but java.sql.Timestamp doesnt allow you to set). As I suggested
in-memory representation (i.e. on which all computations are performed) should either directly
use  LocalTimeZone and ZonedTimeZone or model its behavior on it.
* Serialization format: To transfer timestamp between different vertices. Here primary concern
is performance which comes if TZ is stored separately.

In light of above, I am ok with your proposal of using choice #2, but I think you still need
to think about in-memory format. Because apart from to_utc_timestamp and related udfs implementing
new type : Timestamp with Time Zone with java.sql.Timestamp will be error-prone.

> Allow HiveKey to skip some bytes for comparison
> -----------------------------------------------
>                 Key: HIVE-16418
>                 URL:
>             Project: Hive
>          Issue Type: New Feature
>            Reporter: Rui Li
>            Assignee: Rui Li
>         Attachments: HIVE-16418.1.patch
> The feature is required when we have to serialize some fields and prevent them from being
used in comparison, e.g. HIVE-14412.

This message was sent by Atlassian JIRA

View raw message