hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Teddy Choi (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HIVE-5761) Implement vectorized support for the DATE data type
Date Thu, 28 Nov 2013 03:06:36 GMT

    [ https://issues.apache.org/jira/browse/HIVE-5761?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13834489#comment-13834489
] 

Teddy Choi commented on HIVE-5761:
----------------------------------

Eric,

I researched the history of Hive date data type.

1. DATE in ORC: HIVE-4055 already implemented it. It uses an integer variable DateWritable#daysSinceEpoch
to represent a date. I think there is a hard chance to use the alternative representation,
which I prefer.
1. Basic operations: We may need to use java.sql.Date every time. [~thejas] and [~jdere] already
suggested JodaTime library, which is significantly faster. But there were negative opinions
about additional dependencies in HIVE-3910.
1. Complex operations: Fortunately, they will benefit from DateWritable#daysSinceEpoch representation.
1. Vectorized plan: I'm not sure now. I will run some tests.

The key point is, how to improve basic operations performance with DateWritable#daysSinceEpoch.
I found that org.joda.time.Chronology does not create objects during repetitive calculations
(http://stackoverflow.com/questions/6465330/any-good-high-performance-java-library-that-works-with-timestamp).
It gives me an insight, but looks hard to implement.

I'll start with a basic implementation with java.sql.Date, then I will find more ways to optimize
it.

Teddy

> Implement vectorized support for the DATE data type
> ---------------------------------------------------
>
>                 Key: HIVE-5761
>                 URL: https://issues.apache.org/jira/browse/HIVE-5761
>             Project: Hive
>          Issue Type: Sub-task
>            Reporter: Eric Hanson
>
> Add support to allow queries referencing DATE columns and expression results to run efficiently
in vectorized mode. This should re-use the code for the the integer/timestamp types to the
extent possible and beneficial. Include unit tests and end-to-end tests. Consider re-using
or extending existing end-to-end tests for vectorized integer and/or timestamp operations.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

Mime
View raw message