hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Yang <teddyyyy...@gmail.com>
Subject Re: Parquet support for Timestamp in 0.14
Date Thu, 19 Feb 2015 20:08:09 GMT
Szehon:

another question related to the types support:

if I convert an avro field of UNION to parquet, does hive support that
UNION field ? a UNION is needed because avro field can not take NULL, and I
have to define every field as an UNION of original type and NULL.

Thanks
Yang

On Mon, Feb 9, 2015 at 1:05 PM, Yang <teddyyyy123@gmail.com> wrote:

> Thanks Szehon!
>
> On Tue, Feb 3, 2015 at 7:33 PM, Szehon Ho <szehon@cloudera.com> wrote:
>
>> Hi Yang
>>
>> I saw you posted this question in several places, I gave an answer in
>> HIVE-6394 as I saw that one first, to the timestamp query.
>>
>> Can't speak about about date support, as its not in my knowledge.
>>
>> Thanks
>> Szehon
>>
>> On Mon, Feb 2, 2015 at 4:15 PM, Yang <teddyyyy123@gmail.com> wrote:
>>
>>> the parquet spec about logical types and Timestamp specifically, seems
>>> to say
>>> https://github.com/Parquet/parquet-format/blob/master/LogicalTypes.md
>>> "TIMESTAMP_MILLIS is used for a combined logical date and time type. It
>>> must annotate an int64 that stores the number of milliseconds from the
>>> Unix epoch, 00:00:00.000 on 1 January 1970, UTC.
>>>
>>> <https://github.com/Parquet/parquet-format/blob/master/LogicalTypes.md#interval>
>>> "
>>>
>>>
>>> i.e. here it says that the type is only precise to the point of
>>> miliseconds and it starts from 1970.
>>>
>>>
>>> but if u look at the hive-parquet code in
>>>
>>> https://github.com/apache/hive/blob/branch-0.14/ql/src/java/org/apache/hadoop/hive/ql/io/parquet/convert/ETypeConverter.java#L142
>>>
>>> https://github.com/apache/hive/blob/branch-0.14/ql/src/java/org/apache/hadoop/hive/ql/io/parquet/timestamp/NanoTime.java#L54
>>> it seems that hive's encoding of timestamp on parquet is of a different
>>> spec, precise to the point of nano seconds, and starting from "Monday,
>>> January 1, 4713 " (defined in jodd.datetime.JDateTime)
>>>
>>>
>>> so Hive's parquet timestamp storage is completely different from the
>>> above spec ?
>>>
>>>
>>>
>>>
>>> what about Date support? https://issues.apache.org/jira/browse/HIVE-8119
>>> are we going to have a different on-disk binary encoding than the
>>> "int32" specified in the above doc?
>>>
>>> thanks
>>> Yang
>>>
>>>
>>>
>>>
>>>
>>
>

Mime
View raw message