hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Barna Zsombor Klara <zsombor.kl...@cloudera.com>
Subject Re: Review Request 56334: HIVE-12767: Implement table property to address Parquet int96 timestamp bug
Date Mon, 13 Feb 2017 13:59:41 GMT

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/56334/
-----------------------------------------------------------

(Updated Feb. 13, 2017, 1:59 p.m.)


Review request for hive, Ryan Blue and Sergio Pena.


Changes
-------

Renamed ParquetTableUtils.PARQUET_INT96_DEFAULT_WRITE_ZONE constant to make its purpose clearer.
This is not a TimeZone we convert into and print out, rather a delta, an adjustment we use,
or more precisely the lack of an adjustment.


Bugs: HIVE-12767
    https://issues.apache.org/jira/browse/HIVE-12767


Repository: hive-git


Description
-------

This is a followup on this review request: https://reviews.apache.org/r/41821
The following exit criteria is addressed in this patch:

- Hive will read Parquet MR int96 timestamp data and adjust values using a time zone from
a table property, if set, or using the local time zone if it is absent. No adjustment will
be applied to data written by Impala.
- Hive will write Parquet int96 timestamps using a time zone adjustment from the same table
property, if set, or using the local time zone if it is absent. This keeps the data in the
table consistent.
- New tables created by Hive will set the table property to UTC if the global option to set
the property for new tables is enabled.
- Tables created using CREATE TABLE and CREATE TABLE LIKE FILE will not set the property unless
the global setting to do so is enabled.
- Tables created using CREATE TABLE LIKE <OTHER TABLE> will copy the property of the
table that is copied.

To set the timezone table property, use this:
  create table tbl1 (ts timestamp) stored as parquet tblproperties ('parquet.mr.int96.write.zone'='PST');

To set UTC as default timezone table property on new tables created, use this: 
  set parquet.mr.int96.enable.utc.write.zone=true;
  create table tbl2 (ts timestamp) stored as parquet;


Diffs (updated)
-----

  common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 0e4f1f6610d2cdf543f106061a21ab465899737d

  data/files/impala_int96_timestamp.parq PRE-CREATION 
  itests/hive-jmh/src/main/java/org/apache/hive/benchmark/storage/ColumnarStorageBench.java
a14b7900afb00a7d304b0dc4f6482a2b87716919 
  ql/src/java/org/apache/hadoop/hive/ql/exec/DDLTask.java adabe70fa8f0fe1b990c6ac578a14ff5af06fc93

  ql/src/java/org/apache/hadoop/hive/ql/io/parquet/MapredParquetOutputFormat.java 379a9135d9c631b2f473976b00f3dc87f9fec0c4

  ql/src/java/org/apache/hadoop/hive/ql/io/parquet/ParquetRecordReaderBase.java 167f9b6516ac093fa30091daf6965de25e3eccb3

  ql/src/java/org/apache/hadoop/hive/ql/io/parquet/convert/ETypeConverter.java 76d93b8e02a98c95da8a534f2820cd3e77b4bb43

  ql/src/java/org/apache/hadoop/hive/ql/io/parquet/read/DataWritableReadSupport.java 604cbbcc2a9daa8594397e315cc4fd8064cc5005

  ql/src/java/org/apache/hadoop/hive/ql/io/parquet/read/ParquetRecordReaderWrapper.java ac430a67682d3dcbddee89ce132fc0c1b421e368

  ql/src/java/org/apache/hadoop/hive/ql/io/parquet/serde/ParquetTableUtils.java PRE-CREATION

  ql/src/java/org/apache/hadoop/hive/ql/io/parquet/timestamp/NanoTimeUtils.java 3fd75d24f3fda36967e4957e650aec19050b22f8

  ql/src/java/org/apache/hadoop/hive/ql/io/parquet/vector/VectorizedParquetRecordReader.java
b6a1a7a64db6db0bf06d2eea70a308b88f06156e 
  ql/src/java/org/apache/hadoop/hive/ql/io/parquet/vector/VectorizedPrimitiveColumnReader.java
3d5c6e6a092dd6a0303fadc6a244dad2e31cd853 
  ql/src/java/org/apache/hadoop/hive/ql/io/parquet/write/DataWritableWriteSupport.java f4621e5dbb81e8d58c4572c901ec9d1a7ca8c012

  ql/src/java/org/apache/hadoop/hive/ql/io/parquet/write/DataWritableWriter.java 6b7b50a25e553629f0f492e964cc4913417cb500

  ql/src/test/org/apache/hadoop/hive/ql/io/parquet/TestDataWritableWriter.java 934ae9f255d0c4ccaa422054fcc9e725873810d4

  ql/src/test/org/apache/hadoop/hive/ql/io/parquet/TestVectorizedColumnReader.java 670bfa609704d3001dd171b703b657f57fbd4c74

  ql/src/test/org/apache/hadoop/hive/ql/io/parquet/VectorizedColumnReaderTestBase.java f537ceee505c5f41d513df3c89b63453012c9979

  ql/src/test/org/apache/hadoop/hive/ql/io/parquet/convert/TestETypeConverter.java PRE-CREATION

  ql/src/test/org/apache/hadoop/hive/ql/io/parquet/serde/TestParquetTimestampUtils.java ec6def5b9ac5f12e6a7cb24c4f4998a6ca6b4a8e

  ql/src/test/org/apache/hadoop/hive/ql/io/parquet/timestamp/TestNanoTimeUtils.java PRE-CREATION

  ql/src/test/queries/clientpositive/parquet_int96_timestamp.q PRE-CREATION 
  ql/src/test/queries/clientpositive/parquet_timestamp_conversion.q PRE-CREATION 
  ql/src/test/results/clientpositive/parquet_int96_timestamp.q.out PRE-CREATION 
  ql/src/test/results/clientpositive/parquet_timestamp_conversion.q.out PRE-CREATION 

Diff: https://reviews.apache.org/r/56334/diff/


Testing
-------

qtest and unit tests added.


Thanks,

Barna Zsombor Klara


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message