impala-reviews mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Impala Public Jenkins (Code Review)" <ger...@cloudera.org>
Subject [Impala-ASF-CR] IMPALA-2716: Hive/Impala incompatibility for timestamp data in Parquet
Date Tue, 02 May 2017 20:24:09 GMT
Impala Public Jenkins has submitted this change and it was merged.

Change subject: IMPALA-2716: Hive/Impala incompatibility for timestamp data in Parquet
......................................................................


IMPALA-2716: Hive/Impala incompatibility for timestamp data in Parquet

Before this change:
Hive adjusts timestamps by subtracting the local time zone's offset
from all values when writing data to Parquet files. Hive is internally
inconsistent because it behaves differently for other file formats. As
a result of this adjustment, Impala may read "incorrect" timestamp
values from Parquet files written by Hive.

After this change:
Impala reads Parquet MR timestamp data and adjusts values using a time
zone from a table property (parquet.mr.int96.write.zone), if set, and
will not adjust it if the property is absent. No adjustment will be
applied to data written by Impala.

New HDFS tables created by Impala using CREATE TABLE and CREATE TABLE
LIKE <file> will set the table property to UTC if the global flag
--set_parquet_mr_int96_write_zone_to_utc_on_new_tables is set to true.

HDFS tables created by Impala using CREATE TABLE LIKE <other table>
will copy the property of the table that is copied.

This change also affects the way Impala deals with
--convert_legacy_hive_parquet_utc_timestamps global flag (introduced
in IMPALA-1658). The flag will be taken into account only if
parquet.mr.int96.write.zone table property is not set and ignored
otherwise.

Change-Id: I3f24525ef45a2814f476bdee76655b30081079d6
Reviewed-on: http://gerrit.cloudera.org:8080/5939
Reviewed-by: Dan Hecht <dhecht@cloudera.com>
Tested-by: Impala Public Jenkins
---
M be/src/benchmarks/CMakeLists.txt
A be/src/benchmarks/convert-timestamp-benchmark.cc
M be/src/exec/hdfs-scan-node-base.cc
M be/src/exec/hdfs-scan-node-base.h
M be/src/exec/parquet-column-readers.cc
M be/src/exprs/timestamp-functions.cc
M be/src/exprs/timezone_db.h
M be/src/runtime/timestamp-value.cc
M be/src/runtime/timestamp-value.h
M be/src/service/fe-support.cc
M be/src/service/impala-server.cc
M be/src/util/backend-gflag-util.cc
M common/thrift/BackendGflags.thrift
M common/thrift/PlanNodes.thrift
M common/thrift/generate_error_codes.py
M fe/src/main/java/org/apache/impala/analysis/AlterTableSetTblProperties.java
M fe/src/main/java/org/apache/impala/analysis/BaseTableRef.java
M fe/src/main/java/org/apache/impala/analysis/CreateTableStmt.java
M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java
M fe/src/main/java/org/apache/impala/service/BackendConfig.java
M fe/src/main/java/org/apache/impala/service/FeSupport.java
M fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java
M tests/common/impala_test_suite.py
M tests/custom_cluster/test_hive_parquet_timestamp_conversion.py
M tests/metadata/test_ddl.py
M tests/metadata/test_ddl_base.py
A tests/query_test/test_parquet_timestamp_compatibility.py
28 files changed, 850 insertions(+), 74 deletions(-)

Approvals:
  Impala Public Jenkins: Verified
  Dan Hecht: Looks good to me, approved



-- 
To view, visit http://gerrit.cloudera.org:8080/5939
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: merged
Gerrit-Change-Id: I3f24525ef45a2814f476bdee76655b30081079d6
Gerrit-PatchSet: 12
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Attila Jeges <attilaj@cloudera.com>
Gerrit-Reviewer: Alex Behm <alex.behm@cloudera.com>
Gerrit-Reviewer: Attila Jeges <attilaj@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dhecht@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Michael Ho
Gerrit-Reviewer: Taras Bobrovytsky <tbobrovytsky@cloudera.com>
Gerrit-Reviewer: Zoltan Ivanfi <zi+gerrit@cloudera.com>

Mime
View raw message