hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Teddy Choi (JIRA)" <>
Subject [jira] [Commented] (HIVE-5761) Implement vectorized support for the DATE data type
Date Thu, 28 Nov 2013 03:06:36 GMT


Teddy Choi commented on HIVE-5761:


I researched the history of Hive date data type.

1. DATE in ORC: HIVE-4055 already implemented it. It uses an integer variable DateWritable#daysSinceEpoch
to represent a date. I think there is a hard chance to use the alternative representation,
which I prefer.
1. Basic operations: We may need to use java.sql.Date every time. [~thejas] and [~jdere] already
suggested JodaTime library, which is significantly faster. But there were negative opinions
about additional dependencies in HIVE-3910.
1. Complex operations: Fortunately, they will benefit from DateWritable#daysSinceEpoch representation.
1. Vectorized plan: I'm not sure now. I will run some tests.

The key point is, how to improve basic operations performance with DateWritable#daysSinceEpoch.
I found that org.joda.time.Chronology does not create objects during repetitive calculations
It gives me an insight, but looks hard to implement.

I'll start with a basic implementation with java.sql.Date, then I will find more ways to optimize


> Implement vectorized support for the DATE data type
> ---------------------------------------------------
>                 Key: HIVE-5761
>                 URL:
>             Project: Hive
>          Issue Type: Sub-task
>            Reporter: Eric Hanson
> Add support to allow queries referencing DATE columns and expression results to run efficiently
in vectorized mode. This should re-use the code for the the integer/timestamp types to the
extent possible and beneficial. Include unit tests and end-to-end tests. Consider re-using
or extending existing end-to-end tests for vectorized integer and/or timestamp operations.

This message was sent by Atlassian JIRA

View raw message