hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Barna Zsombor Klara <zsombor.kl...@cloudera.com>
Subject Re: Review Request 56334: HIVE-12767: Implement table property to address Parquet int96 timestamp bug
Date Mon, 13 Feb 2017 12:58:43 GMT


> On Feb. 10, 2017, 4:56 p.m., Zoltan Ivanfi wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/io/parquet/MapredParquetOutputFormat.java,
line 150
> > <https://reviews.apache.org/r/56334/diff/4-5/?file=1628592#file1628592line150>
> >
> >     This issue is scattered around in different parts of the code, but this is where
I first noticed it: PARQUET_INT96_DEFAULT_WRITE_ZONE is set to UTC by default and the time
zone adjustment is set to this value if not specified by a table property.
> >     
> >     This does not match the exit criteria, which states that the local timezone
must be used if the table property is missing. (There is a separate global switch controlling
the default value of the table property to set when creating new tables, but that's a different
thing.)

I think this setting is the correct one. If you check NanoTimeUtils the calendar we pass in
is used to adjust the default calendar (compared to UTC). If we want to keep it as default
we should pass in the 0 adjustment which is UTC/GMT.


- Barna Zsombor


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/56334/#review165133
-----------------------------------------------------------


On Feb. 10, 2017, 1:41 p.m., Barna Zsombor Klara wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/56334/
> -----------------------------------------------------------
> 
> (Updated Feb. 10, 2017, 1:41 p.m.)
> 
> 
> Review request for hive, Ryan Blue and Sergio Pena.
> 
> 
> Bugs: HIVE-12767
>     https://issues.apache.org/jira/browse/HIVE-12767
> 
> 
> Repository: hive-git
> 
> 
> Description
> -------
> 
> This is a followup on this review request: https://reviews.apache.org/r/41821
> The following exit criteria is addressed in this patch:
> 
> - Hive will read Parquet MR int96 timestamp data and adjust values using a time zone
from a table property, if set, or using the local time zone if it is absent. No adjustment
will be applied to data written by Impala.
> - Hive will write Parquet int96 timestamps using a time zone adjustment from the same
table property, if set, or using the local time zone if it is absent. This keeps the data
in the table consistent.
> - New tables created by Hive will set the table property to UTC if the global option
to set the property for new tables is enabled.
> - Tables created using CREATE TABLE and CREATE TABLE LIKE FILE will not set the property
unless the global setting to do so is enabled.
> - Tables created using CREATE TABLE LIKE <OTHER TABLE> will copy the property of
the table that is copied.
> 
> To set the timezone table property, use this:
>   create table tbl1 (ts timestamp) stored as parquet tblproperties ('parquet.mr.int96.write.zone'='PST');
> 
> To set UTC as default timezone table property on new tables created, use this: 
>   set parquet.mr.int96.enable.utc.write.zone=true;
>   create table tbl2 (ts timestamp) stored as parquet;
> 
> 
> Diffs
> -----
> 
>   common/src/java/org/apache/hadoop/hive/conf/HiveConf.java b27b663b94f41a8250b79139ed9f7275b10cf9a3

>   data/files/impala_int96_timestamp.parq PRE-CREATION 
>   itests/hive-jmh/src/main/java/org/apache/hive/benchmark/storage/ColumnarStorageBench.java
a14b7900afb00a7d304b0dc4f6482a2b87716919 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/DDLTask.java adabe70fa8f0fe1b990c6ac578a14ff5af06fc93

>   ql/src/java/org/apache/hadoop/hive/ql/io/parquet/MapredParquetOutputFormat.java 379a9135d9c631b2f473976b00f3dc87f9fec0c4

>   ql/src/java/org/apache/hadoop/hive/ql/io/parquet/ParquetRecordReaderBase.java 167f9b6516ac093fa30091daf6965de25e3eccb3

>   ql/src/java/org/apache/hadoop/hive/ql/io/parquet/convert/ETypeConverter.java 76d93b8e02a98c95da8a534f2820cd3e77b4bb43

>   ql/src/java/org/apache/hadoop/hive/ql/io/parquet/read/DataWritableReadSupport.java
604cbbcc2a9daa8594397e315cc4fd8064cc5005 
>   ql/src/java/org/apache/hadoop/hive/ql/io/parquet/read/ParquetRecordReaderWrapper.java
ac430a67682d3dcbddee89ce132fc0c1b421e368 
>   ql/src/java/org/apache/hadoop/hive/ql/io/parquet/serde/ParquetTableUtils.java PRE-CREATION

>   ql/src/java/org/apache/hadoop/hive/ql/io/parquet/timestamp/NanoTimeUtils.java 3fd75d24f3fda36967e4957e650aec19050b22f8

>   ql/src/java/org/apache/hadoop/hive/ql/io/parquet/vector/VectorizedParquetRecordReader.java
b6a1a7a64db6db0bf06d2eea70a308b88f06156e 
>   ql/src/java/org/apache/hadoop/hive/ql/io/parquet/vector/VectorizedPrimitiveColumnReader.java
3d5c6e6a092dd6a0303fadc6a244dad2e31cd853 
>   ql/src/java/org/apache/hadoop/hive/ql/io/parquet/write/DataWritableWriteSupport.java
f4621e5dbb81e8d58c4572c901ec9d1a7ca8c012 
>   ql/src/java/org/apache/hadoop/hive/ql/io/parquet/write/DataWritableWriter.java 6b7b50a25e553629f0f492e964cc4913417cb500

>   ql/src/test/org/apache/hadoop/hive/ql/io/parquet/TestDataWritableWriter.java 934ae9f255d0c4ccaa422054fcc9e725873810d4

>   ql/src/test/org/apache/hadoop/hive/ql/io/parquet/TestVectorizedColumnReader.java 670bfa609704d3001dd171b703b657f57fbd4c74

>   ql/src/test/org/apache/hadoop/hive/ql/io/parquet/VectorizedColumnReaderTestBase.java
f537ceee505c5f41d513df3c89b63453012c9979 
>   ql/src/test/org/apache/hadoop/hive/ql/io/parquet/convert/TestETypeConverter.java PRE-CREATION

>   ql/src/test/org/apache/hadoop/hive/ql/io/parquet/serde/TestParquetTimestampUtils.java
ec6def5b9ac5f12e6a7cb24c4f4998a6ca6b4a8e 
>   ql/src/test/org/apache/hadoop/hive/ql/io/parquet/timestamp/TestNanoTimeUtils.java PRE-CREATION

>   ql/src/test/queries/clientpositive/parquet_int96_timestamp.q PRE-CREATION 
>   ql/src/test/queries/clientpositive/parquet_timestamp_conversion.q PRE-CREATION 
>   ql/src/test/results/clientpositive/parquet_int96_timestamp.q.out PRE-CREATION 
>   ql/src/test/results/clientpositive/parquet_timestamp_conversion.q.out PRE-CREATION

> 
> Diff: https://reviews.apache.org/r/56334/diff/
> 
> 
> Testing
> -------
> 
> qtest and unit tests added.
> 
> 
> Thanks,
> 
> Barna Zsombor Klara
> 
>


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message