hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Yang <teddyyyy...@gmail.com>
Subject Re: Parquet support for Timestamp in 0.14
Date Thu, 19 Feb 2015 21:13:31 GMT
ah... found out.  my issue is that hive 0.13 doesn't handle this correctly.
could be a bug.

used 0.14, it works.

btw the UNION[int, null] translates to parquet as a field "optional int32
myfieldName", I found this by calling ParquetFileReader.readFooter()



On Thu, Feb 19, 2015 at 12:08 PM, Yang <teddyyyy123@gmail.com> wrote:

> Szehon:
>
> another question related to the types support:
>
> if I convert an avro field of UNION to parquet, does hive support that
> UNION field ? a UNION is needed because avro field can not take NULL, and I
> have to define every field as an UNION of original type and NULL.
>
> Thanks
> Yang
>
> On Mon, Feb 9, 2015 at 1:05 PM, Yang <teddyyyy123@gmail.com> wrote:
>
>> Thanks Szehon!
>>
>> On Tue, Feb 3, 2015 at 7:33 PM, Szehon Ho <szehon@cloudera.com> wrote:
>>
>>> Hi Yang
>>>
>>> I saw you posted this question in several places, I gave an answer in
>>> HIVE-6394 as I saw that one first, to the timestamp query.
>>>
>>> Can't speak about about date support, as its not in my knowledge.
>>>
>>> Thanks
>>> Szehon
>>>
>>> On Mon, Feb 2, 2015 at 4:15 PM, Yang <teddyyyy123@gmail.com> wrote:
>>>
>>>> the parquet spec about logical types and Timestamp specifically, seems
>>>> to say
>>>> https://github.com/Parquet/parquet-format/blob/master/LogicalTypes.md
>>>> "TIMESTAMP_MILLIS is used for a combined logical date and time type.
>>>> It must annotate an int64 that stores the number of milliseconds from
>>>> the Unix epoch, 00:00:00.000 on 1 January 1970, UTC.
>>>>
>>>> <https://github.com/Parquet/parquet-format/blob/master/LogicalTypes.md#interval>
>>>> "
>>>>
>>>>
>>>> i.e. here it says that the type is only precise to the point of
>>>> miliseconds and it starts from 1970.
>>>>
>>>>
>>>> but if u look at the hive-parquet code in
>>>>
>>>> https://github.com/apache/hive/blob/branch-0.14/ql/src/java/org/apache/hadoop/hive/ql/io/parquet/convert/ETypeConverter.java#L142
>>>>
>>>> https://github.com/apache/hive/blob/branch-0.14/ql/src/java/org/apache/hadoop/hive/ql/io/parquet/timestamp/NanoTime.java#L54
>>>> it seems that hive's encoding of timestamp on parquet is of a different
>>>> spec, precise to the point of nano seconds, and starting from "Monday,
>>>> January 1, 4713 " (defined in jodd.datetime.JDateTime)
>>>>
>>>>
>>>> so Hive's parquet timestamp storage is completely different from the
>>>> above spec ?
>>>>
>>>>
>>>>
>>>>
>>>> what about Date support?
>>>> https://issues.apache.org/jira/browse/HIVE-8119
>>>> are we going to have a different on-disk binary encoding than the
>>>> "int32" specified in the above doc?
>>>>
>>>> thanks
>>>> Yang
>>>>
>>>>
>>>>
>>>>
>>>>
>>>
>>
>

Mime
View raw message