hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Szehon Ho" <sze...@cloudera.com>
Subject Re: Review Request 30337: HIVE-9482 : Hive parquet timestamp compatibility
Date Tue, 27 Jan 2015 23:07:45 GMT

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/30337/
-----------------------------------------------------------

(Updated Jan. 27, 2015, 11:07 p.m.)


Review request for hive and Brock Noland.


Bugs: HIVE-9482
    https://issues.apache.org/jira/browse/HIVE-9482


Repository: hive-git


Description
-------

In current Hive implementation, timestamps are stored in UTC (converted from current timezone),
based on original parquet timestamp spec.
However, we find this is not compatibility with other tools, and after some investigation
it is not the way of the other file formats, or even some databases (Hive Timestamp is more
equivalent of 'timestamp without timezone' datatype).

This is the first part of the fix, which will restore compatibility with parquet-timestamp
files generated by external tools by skipping conversion on reading.

Later fix will change the write path to not convert, and stop the read-conversion even for
files written by Hive itself.


Diffs
-----

  common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 64e7e0a 
  data/files/parquet_external_time.parq PRE-CREATION 
  ql/src/java/org/apache/hadoop/hive/ql/io/parquet/convert/ConverterParent.java a86d6f4 
  ql/src/java/org/apache/hadoop/hive/ql/io/parquet/convert/DataWritableRecordConverter.java
000e8ea 
  ql/src/java/org/apache/hadoop/hive/ql/io/parquet/convert/ETypeConverter.java 23bb364 
  ql/src/java/org/apache/hadoop/hive/ql/io/parquet/convert/HiveCollectionConverter.java 872900b

  ql/src/java/org/apache/hadoop/hive/ql/io/parquet/convert/HiveGroupConverter.java 11772be

  ql/src/java/org/apache/hadoop/hive/ql/io/parquet/convert/HiveStructConverter.java eeb3838

  ql/src/java/org/apache/hadoop/hive/ql/io/parquet/convert/Repeated.java af28b4c 
  ql/src/java/org/apache/hadoop/hive/ql/io/parquet/read/DataWritableReadSupport.java 3f8e4d7

  ql/src/java/org/apache/hadoop/hive/ql/io/parquet/read/ParquetRecordReaderWrapper.java 4e4d7fd

  ql/src/java/org/apache/hadoop/hive/ql/io/parquet/timestamp/NanoTimeUtils.java c647b24 
  ql/src/java/org/apache/hadoop/hive/ql/io/parquet/write/DataWritableWriter.java 41b5f1c 
  ql/src/test/org/apache/hadoop/hive/ql/io/parquet/serde/TestParquetTimestampUtils.java 2e788bd

  ql/src/test/queries/clientpositive/parquet_external_time.q PRE-CREATION 
  ql/src/test/results/clientpositive/parquet_external_time.q.out PRE-CREATION 

Diff: https://reviews.apache.org/r/30337/diff/


Testing
-------

Added new unit tests (TestParquetTimestampUtils) to test non-conversion code-path.

Also added new q-test, to read a parquet timestamp-file generated by an external tool, in
this case Impala.


Thanks,

Szehon Ho


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message