hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Teddy Choi (JIRA)" <>
Subject [jira] [Commented] (HIVE-5761) Implement vectorized support for the DATE data type
Date Sun, 01 Dec 2013 20:18:35 GMT


Teddy Choi commented on HIVE-5761:

I wrote a draft version.

DATE shall be implemented within a LongColumnVector. HIVE-4055 represents a DATE value by
a number of days since epoch. A vectorized DATE representation will contain this number and
its optional cached parse result. A read operation result and a complex date function result,
such as date_add and date_sub, will have an empty cache. During the first simple date function,
such as year, month and day, it will cache its parse result. Then following simple functions
will reuse its cache to avoid repeated parses. Its effect on performance will be small, since
java.util.Date calculates all fields at once and caches their results. The first 32-bit set
will represent a number of days since epoch as a signed integer. Its range is about from BC
2^31/365-1970 to AD 2^31/365+1970. A comparison between vectorized DATE values should consider
only their first sets. The following 32-bit set will represent its cached parse result; cached
state (1 bit; 0 for not cached, 1 for cached), era (1 bit; 0 for AD, 1 for BC), year (unsigned
21-bit integer), month (unsigned 4-bit integer) and day of month (unsigned 5-bit integer).
A value without a cache will have only zero bits after its first set. A parsed year, month
and day of month value will start from 1 to represent the exact number. Its range is from
BC 2^21 to AD 2^21, which is shorter than the first set. If a date is not in the range, its
cached state will remain false (0). The value 0xFFFFFFFF00000000L shall be reserved for future
use to indicate data outside the standard range.

> Implement vectorized support for the DATE data type
> ---------------------------------------------------
>                 Key: HIVE-5761
>                 URL:
>             Project: Hive
>          Issue Type: Sub-task
>            Reporter: Eric Hanson
>            Assignee: Teddy Choi
> Add support to allow queries referencing DATE columns and expression results to run efficiently
in vectorized mode. This should re-use the code for the the integer/timestamp types to the
extent possible and beneficial. Include unit tests and end-to-end tests. Consider re-using
or extending existing end-to-end tests for vectorized integer and/or timestamp operations.

This message was sent by Atlassian JIRA

View raw message