hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Mikhail Bautin (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HIVE-4525) Support timestamps earlier than 1970 and later than 2038
Date Thu, 16 May 2013 01:51:15 GMT

    [ https://issues.apache.org/jira/browse/HIVE-4525?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13659128#comment-13659128
] 

Mikhail Bautin commented on HIVE-4525:
--------------------------------------

I am not quite sure how to solve the backward compatibility issue in the "writable" part of
{{TimestampWritable}} code ({{write}}/{{readFields}}) by switching to a unified nanosecond-timestamp-as-long
format. If {{readFields}} is presented with eight bytes, would it interpret them as a four-byte
int followed by a VInt or as a long nanosecond timestamp? Would it attempt to do the former
and revert to the latter if there are inconsistencies? What if the bytes of a long nanosecond
timestamp also happen to represent a valid legacy (int/VInt) timestamp?

In my patch, I try to maintain backward compatibility as much as possible. If a timestamp
is in the range that can be represented by the old format, it is serialized using the old
format. The extended format I've proposed and implemented for the full timestamp range builds
on top of the existing one and can be unambiguously distinguished from the old format by examining
serialized bytes.
In addition, the included test, {{TestTimestampWritable}}, tests both the old and the new
(extended format), as well as double/BigDecimal conversion, getters/setters/constructors and
everything else I could test in {{TimestampWritable}}.

I am sure there is a way to handle vector optimizations for timestamps in a backward-compatible
way, and I don't think this patch would make it much more complicated than it already is.
However, vectorized computations are a performance optimization, while this issue is a correctness
fix. Currently, timestamps outside of the ~1970-2038 range would be silently corrupted in
some queries, and this patch successfully fixes that. It is also pretty small and immediately
available.


                
> Support timestamps earlier than 1970 and later than 2038
> --------------------------------------------------------
>
>                 Key: HIVE-4525
>                 URL: https://issues.apache.org/jira/browse/HIVE-4525
>             Project: Hive
>          Issue Type: Bug
>            Reporter: Mikhail Bautin
>            Assignee: Mikhail Bautin
>         Attachments: D10755.1.patch
>
>
> TimestampWritable currently serializes timestamps using the lower 31 bits of an int.
This does not allow to store timestamps earlier than 1970 or later than a certain point in
2038.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message