impala-reviews mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Alex Behm (Code Review)" <ger...@cloudera.org>
Subject [Impala-ASF-CR] IMPALA-2716: Hive/Impala incompatibility for timestamp data in Parquet
Date Tue, 11 Apr 2017 22:00:02 GMT
Alex Behm has posted comments on this change.

Change subject: IMPALA-2716: Hive/Impala incompatibility for timestamp data in Parquet
......................................................................


Patch Set 6:

(10 comments)

I'm pretty happy with this change.

I think we should consider adding additional test cases for interesting boundary conditions,
e.g., when there is ambiguity in the tz -> UTC conversion, but not in this patch.

http://gerrit.cloudera.org:8080/#/c/5939/6/fe/src/main/java/org/apache/impala/analysis/BaseTableRef.java
File fe/src/main/java/org/apache/impala/analysis/BaseTableRef.java:

Line 113:           "Invalid time zone in the the '%s' table property: %s",
double 'the'


http://gerrit.cloudera.org:8080/#/c/5939/6/fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java
File fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java:

Line 665:     // Attempt to set 'parquet.mr.int96.write.zone' table property. Positive case.
Let's move all the CREATE/ALTER tests into a separate TestParquetMrInt96WriteZone()

To me that org seems more natural.


Line 1882:         "\"/test-warehouse/alltypesagg_hive_13_1_parquet/" +
easier to read single quotes


Line 1904:         "\"/test-warehouse/alltypesagg_hive_13_1_parquet/" +
easier to read single quotes


http://gerrit.cloudera.org:8080/#/c/5939/6/tests/custom_cluster/test_hive_parquet_timestamp_conversion.py
File tests/custom_cluster/test_hive_parquet_timestamp_conversion.py:

Line 27:   '''Hive writes timestamps in Parquet files by first converting values from local
time
Thank you! This comment is very informative and well written.


Line 105:     parquet_fn = get_fs_path(
What does "fn" stand for? I'm thinking "file name", but this is not just a file name.


Line 123:           i ON i.id = h.id AND i.day = h.day  -- serves as a unique key
easier to read with the alias 'i' next to the table


Line 125:           (h.timestamp_col IS NULL AND i.timestamp_col IS NOT NULL)
simplify the first two conditions with:

h.timestamp_col IS NULL != i.timestamp_col IS NULL

please apply the same changes to queries in:
test_parquet_timestamp_compatibility.py


http://gerrit.cloudera.org:8080/#/c/5939/6/tests/query_test/test_parquet_timestamp_compatibility.py
File tests/query_test/test_parquet_timestamp_compatibility.py:

Line 78:   def test_garbage_parquet_mr_write_zone(self, vector, unique_database):
test_invalid_parquet_mr_write_zone


Line 118:       # 'parquet.mr.int96.write.zone' table property to tz_name triggers  a 'UTC'
->
extra space after "triggers"


-- 
To view, visit http://gerrit.cloudera.org:8080/5939
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I3f24525ef45a2814f476bdee76655b30081079d6
Gerrit-PatchSet: 6
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Attila Jeges <attilaj@cloudera.com>
Gerrit-Reviewer: Alex Behm <alex.behm@cloudera.com>
Gerrit-Reviewer: Attila Jeges <attilaj@cloudera.com>
Gerrit-Reviewer: Michael Ho
Gerrit-Reviewer: Taras Bobrovytsky <tbobrovytsky@cloudera.com>
Gerrit-Reviewer: Zoltan Ivanfi <zi+gerrit@cloudera.com>
Gerrit-HasComments: Yes

Mime
View raw message