hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jason Dere (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HIVE-3910) Create a new DATE datatype
Date Thu, 06 Jun 2013 03:14:20 GMT

    [ https://issues.apache.org/jira/browse/HIVE-3910?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13676626#comment-13676626
] 

Jason Dere commented on HIVE-3910:
----------------------------------

HIVE-4055 already has a patch with an initial implementation of a DATE type, which has already
done quite a bit of the work for DATE support. Took a look at this and I had a few proposed
additions to this:

1. Use Joda Time rather than java.sql.Date
The existing patch uses java.sql.Date as the underlying data type (based on java.util.Date).
 Thejas proposed using the Joda Time library as this is supposed to be a better datetime
implementation, and is also used by Pig for datetime handling.  It does not appear that Joda
Time is currently used by Hive and so this would need to be pulled in as a dependent library.

2. Storage of DATE values
In the existing patch, DateWritable writes out long value (8 bytes) representing seconds since
the Unix epoch.  As mentioned in HIVE-3910, since DATE is in days, we could reduce the storage
space by instead storing a 4-byte integer value representing days since some epoch (1970?
4713 BC?). The range of dates that we can represent with such an integer representation would
be +/- 2 billion days, or 5.8M years.

3. Considerations for Hive vectorization support
Talking to some folks who are concerned about Hive vectorization (HIVE-4160), and in the interests
of vectorization support they want the date type to be represented as primitive values.  They
are proposing that DateWritable would hold the integer value (rather than Date value) which
will still be usable for comparison operations, which would be the most common operations
that would be used on date types (group-by, sorting).  If an actual Date value is required,
then DateWritable.get() will generate a Date object based on the days-since-epoch integer
value.

4. SQL syntax compliance
The existing patch creates date values using a DATE() UDF - DATE('2013-01-01). The SQL standard
actually has syntax to represent a date literal - DATE '2013-01-01'.  The Hive grammar would
need to be extended to support the SQL syntax.

5. Operations on DATE types
The SQL standard (section 6.14) looks like it just supports DATE operations involving the
INTERVAL type:
        <datetime value expression> ::=
                <datetime term>
              | <interval value expression> <plus sign> <datetime
term>
              | <datetime value expression> <plus sign> <interval
term>
              | <datetime value expression> <minus sign> <interval
term>

There is currently no interval type support in Hive. Support for the interval type will be
added as a later item.

6. Compatibility with other types
The existing patch allows a lot of implicit conversion to/from other types (numeric, string).
 It does appear that TIMESTAMP has set a bit of a precedent in allowing a lot of implicit
type conversion.  However, given the limited operations with other types as described in above
from the SQL standard, I would propose limiting the amount of implicit conversion that is
allowed.  There are UDFs that the user can use to convert DATE into numeric/string values,
which can then be used in arithmetic or aggregation functions.  
                
> Create a new DATE datatype
> --------------------------
>
>                 Key: HIVE-3910
>                 URL: https://issues.apache.org/jira/browse/HIVE-3910
>             Project: Hive
>          Issue Type: Task
>            Reporter: Namit Jain
>
> It might be useful to have a DATE datatype along with timestamp.
> This can only store the day (possibly number of days from 1970-01-01,
> and would thus give space savings in binary format).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message