hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Barna Zsombor Klara <zsombor.kl...@cloudera.com>
Subject Re: Review Request 56334: HIVE-12767: Implement table property to address Parquet int96 timestamp bug
Date Wed, 08 Feb 2017 13:09:44 GMT


> On Feb. 7, 2017, 10:38 a.m., Peter Vary wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/io/parquet/ParquetRecordReaderBase.java, lines
173-177
> > <https://reviews.apache.org/r/56334/diff/1/?file=1625160#file1625160line173>
> >
> >     Isn't it very simmilar to getParquetWriterTimeZone? Wouldn't it be usefull to
refactor this to the ParquetUtil class?

The code looks similar but we are retrieving the value from two different containers (properties
vs configuration object). I couldn't find a way to easily consolidate this at first glance.


> On Feb. 7, 2017, 10:38 a.m., Peter Vary wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/io/parquet/convert/ETypeConverter.java, line
205
> > <https://reviews.apache.org/r/56334/diff/1/?file=1625161#file1625161line205>
> >
> >     Isn't there a good way to cache this Calendar? Wil we create one for every conversion?

The calendar is used and mutated during the timezone adjustment, so we must be cautious about
reusing it. Sharing across threads is definitely a bad idea and reusing it inside the thread
is probably not a huge gain. Maybe Sergio has some more insight on this?


> On Feb. 7, 2017, 10:38 a.m., Peter Vary wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/io/parquet/vector/VectorizedPrimitiveColumnReader.java,
lines 418-423
> > <https://reviews.apache.org/r/56334/diff/1/?file=1625167#file1625167line418>
> >
> >     Is there a way to reuse the calendar instance here?

Same as above.


> On Feb. 7, 2017, 10:38 a.m., Peter Vary wrote:
> > ql/src/test/org/apache/hadoop/hive/ql/io/parquet/TestDataWritableWriter.java, lines
238-242
> > <https://reviews.apache.org/r/56334/diff/1/?file=1625170#file1625170line238>
> >
> >     nit: indentation?

The indentation is in line with the preexistent **and** highly artistic style of the class.
I would leave this as is. :)


> On Feb. 7, 2017, 10:38 a.m., Peter Vary wrote:
> > ql/src/test/org/apache/hadoop/hive/ql/io/parquet/TestDataWritableWriter.java, lines
249-252
> > <https://reviews.apache.org/r/56334/diff/1/?file=1625170#file1625170line249>
> >
> >     nit: indentation?

The indentation is in line with the preexistent **and** highly artistic style of the class.
I would leave this as is.


> On Feb. 7, 2017, 10:38 a.m., Peter Vary wrote:
> > ql/src/test/org/apache/hadoop/hive/ql/io/parquet/serde/TestParquetTimestampUtils.java,
line 240
> > <https://reviews.apache.org/r/56334/diff/1/?file=1625173#file1625173line240>
> >
> >     Would not be this flaky? At least dependent on the tester local settings? What
happens when running it on Special times when changing between summer/winter time?

No, this test should be consistent. The timezone adjustment in NanoTimeUtils should take care
of it.


- Barna Zsombor


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/56334/#review164481
-----------------------------------------------------------


On Feb. 7, 2017, 2:44 p.m., Barna Zsombor Klara wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/56334/
> -----------------------------------------------------------
> 
> (Updated Feb. 7, 2017, 2:44 p.m.)
> 
> 
> Review request for hive, Ryan Blue and Sergio Pena.
> 
> 
> Bugs: HIVE-12767
>     https://issues.apache.org/jira/browse/HIVE-12767
> 
> 
> Repository: hive-git
> 
> 
> Description
> -------
> 
> This is a followup on this review request: https://reviews.apache.org/r/41821
> The following exit criteria is addressed in this patch:
> 
> - Hive will read Parquet MR int96 timestamp data and adjust values using a time zone
from a table property, if set, or using the local time zone if it is absent. No adjustment
will be applied to data written by Impala.
> - Hive will write Parquet int96 timestamps using a time zone adjustment from the same
table property, if set, or using the local time zone if it is absent. This keeps the data
in the table consistent.
> - New tables created by Hive will set the table property to UTC if the global option
to set the property for new tables is enabled.
> - Tables created using CREATE TABLE and CREATE TABLE LIKE FILE will not set the property
unless the global setting to do so is enabled.
> - Tables created using CREATE TABLE LIKE <OTHER TABLE> will copy the property of
the table that is copied.
> 
> To set the timezone table property, use this:
>   create table tbl1 (ts timestamp) stored as parquet tblproperties ('parquet.mr.int96.write.zone'='PST');
> 
> To set UTC as default timezone table property on new tables created, use this: 
>   set parquet.mr.int96.enable.utc.write.zone=true;
>   create table tbl2 (ts timestamp) stored as parquet;
> 
> 
> Diffs
> -----
> 
>   common/src/java/org/apache/hadoop/hive/conf/HiveConf.java cb27cd6236dc1d54efd2cafda9f1cc975e9d9817

>   data/files/impala_int96_timestamp.parq PRE-CREATION 
>   itests/hive-jmh/src/main/java/org/apache/hive/benchmark/storage/ColumnarStorageBench.java
a14b7900afb00a7d304b0dc4f6482a2b87716919 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/DDLTask.java adabe70fa8f0fe1b990c6ac578a14ff5af06fc93

>   ql/src/java/org/apache/hadoop/hive/ql/io/parquet/MapredParquetOutputFormat.java 379a9135d9c631b2f473976b00f3dc87f9fec0c4

>   ql/src/java/org/apache/hadoop/hive/ql/io/parquet/ParquetRecordReaderBase.java 167f9b6516ac093fa30091daf6965de25e3eccb3

>   ql/src/java/org/apache/hadoop/hive/ql/io/parquet/convert/ETypeConverter.java 76d93b8e02a98c95da8a534f2820cd3e77b4bb43

>   ql/src/java/org/apache/hadoop/hive/ql/io/parquet/read/DataWritableReadSupport.java
604cbbcc2a9daa8594397e315cc4fd8064cc5005 
>   ql/src/java/org/apache/hadoop/hive/ql/io/parquet/read/ParquetRecordReaderWrapper.java
ac430a67682d3dcbddee89ce132fc0c1b421e368 
>   ql/src/java/org/apache/hadoop/hive/ql/io/parquet/serde/ParquetTableUtils.java PRE-CREATION

>   ql/src/java/org/apache/hadoop/hive/ql/io/parquet/timestamp/NanoTimeUtils.java 3fd75d24f3fda36967e4957e650aec19050b22f8

>   ql/src/java/org/apache/hadoop/hive/ql/io/parquet/vector/VectorizedParquetRecordReader.java
699de59b58ecbfa0705f042d22e697f1573ea496 
>   ql/src/java/org/apache/hadoop/hive/ql/io/parquet/vector/VectorizedPrimitiveColumnReader.java
3d5c6e6a092dd6a0303fadc6a244dad2e31cd853 
>   ql/src/java/org/apache/hadoop/hive/ql/io/parquet/write/DataWritableWriteSupport.java
f4621e5dbb81e8d58c4572c901ec9d1a7ca8c012 
>   ql/src/java/org/apache/hadoop/hive/ql/io/parquet/write/DataWritableWriter.java 6b7b50a25e553629f0f492e964cc4913417cb500

>   ql/src/test/org/apache/hadoop/hive/ql/io/parquet/TestDataWritableWriter.java 934ae9f255d0c4ccaa422054fcc9e725873810d4

>   ql/src/test/org/apache/hadoop/hive/ql/io/parquet/VectorizedColumnReaderTestBase.java
833cfdbaa2ac6a3653a8f3232344f5fb70c80686 
>   ql/src/test/org/apache/hadoop/hive/ql/io/parquet/convert/TestETypeConverter.java PRE-CREATION

>   ql/src/test/org/apache/hadoop/hive/ql/io/parquet/serde/TestParquetTimestampUtils.java
ec6def5b9ac5f12e6a7cb24c4f4998a6ca6b4a8e 
>   ql/src/test/org/apache/hadoop/hive/ql/io/parquet/timestamp/TestNanoTimeUtils.java PRE-CREATION

>   ql/src/test/queries/clientpositive/parquet_int96_timestamp.q PRE-CREATION 
>   ql/src/test/results/clientpositive/parquet_int96_timestamp.q.out PRE-CREATION 
> 
> Diff: https://reviews.apache.org/r/56334/diff/
> 
> 
> Testing
> -------
> 
> qtest and unit tests added.
> 
> 
> Thanks,
> 
> Barna Zsombor Klara
> 
>


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message