parquet-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ryan Blue <b...@cloudera.com>
Subject Re: Int96 compatibility with parquet-pig
Date Tue, 03 Mar 2015 16:54:24 GMT
On 03/03/2015 05:32 AM, Guillaume Polaert wrote:
> Hi,
>
> Is there someone working on Int96/Impala compatibility for parquet-pig?
>
> Actually, if I understand, Parquet doesn't have a Timestamp with Nano class
> that can handle this format? Is it right?
>
> At least, we can provide a feature to map Int96 to DateTime (pigloader) and
> vice-verca, loosing nano precision of course.
> What do you think?

I don't think that the Parquet community should add support for int96 
timestamps. The int96 timestamp format is undocumented, though 
implemented in Hive and Impala. It also uses an unannotated int96, so 
there is no way to distinguish between a real int96 and a timestamp.

I don't think that a file format like Parquet should add support for 
undocumented types that are specific to an application. Applications are 
free to store data in Parquet's types as they like by keeping additional 
metadata (the column's timestamp type in Impala), but the format should 
only coordinate those higher-level types through annotations.

I think support for Impala's int96 timestamp should be done in a UDF as 
was suggested on PARQUET-195, and we should add a nanosecond-precision 
timestamp type annotation to coordinate future uses.

Does that sound like a reasonable way forward?

rb

-- 
Ryan Blue
Software Engineer
Cloudera, Inc.

Mime
View raw message