hive-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Rui Li (JIRA)" <>
Subject [jira] [Commented] (HIVE-16418) Allow HiveKey to skip some bytes for comparison
Date Thu, 13 Apr 2017 11:45:41 GMT


Rui Li commented on HIVE-16418:

[~gopalv] - thanks for the review.
My plan is to only allow GMT timezone format, which means '2005-04-03 10:01:00 Asia/Shanghai'
will be converted to '2005-04-03 10:01:00 GMT+08:00' internally. Per Jason's [comment|],
the timezone part shouldn't be used for comparison. Therefore, '2005-04-03 10:01:00 GMT+08:00'
== '2005-04-03 02:01:00 GMT'. And if you run a count(distinct) on these two timestamps, the
result should be 1.

I agree this may cause some confusion in queries with distinct/goupBy like you mentioned.
[~jdere], [~xuefuz] could you please share how this should be handled according to the SQL

This patch could have been included in HIVE-14412. But I'd like to get some early feedbacks
and suggestions. The basic idea is to store all the non-comparable bytes at the beginning
of HiveKey. A boolean is added to HiveKey to indicate whether such bytes exist. And these
bytes will be skipped accordingly in comparison. In serialized format, the boolean will be
encoded using the MSB of the length part. Does this make sense?

> Allow HiveKey to skip some bytes for comparison
> -----------------------------------------------
>                 Key: HIVE-16418
>                 URL:
>             Project: Hive
>          Issue Type: New Feature
>            Reporter: Rui Li
>            Assignee: Rui Li
>         Attachments: HIVE-16418.1.patch
> The feature is required when we have to serialize some fields and prevent them from being
used in comparison, e.g. HIVE-14412.

This message was sent by Atlassian JIRA

View raw message