impala-reviews mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Alex Behm (Code Review)" <ger...@cloudera.org>
Subject [Impala-ASF-CR] IMPALA-2716: Hive/Impala incompatibility for timestamp data in Parquet
Date Tue, 04 Apr 2017 06:22:27 GMT
Alex Behm has posted comments on this change.

Change subject: IMPALA-2716: Hive/Impala incompatibility for timestamp data in Parquet
......................................................................


Patch Set 5:

(16 comments)

http://gerrit.cloudera.org:8080/#/c/5939/5/fe/src/main/java/org/apache/impala/analysis/AlterTableSetTblProperties.java
File fe/src/main/java/org/apache/impala/analysis/AlterTableSetTblProperties.java:

Line 169:   private static void analyzeParquetMrWriteZone(Table table,
These cases need tests in AnalyzeDDL


http://gerrit.cloudera.org:8080/#/c/5939/5/fe/src/main/java/org/apache/impala/analysis/CreateTableStmt.java
File fe/src/main/java/org/apache/impala/analysis/CreateTableStmt.java:

Line 183:     if (getTblProperties().containsKey(HdfsTable.TBL_PROP_PARQUET_MR_WRITE_ZONE))
{
These cases need tests in AnalyzeDDL


http://gerrit.cloudera.org:8080/#/c/5939/5/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
File fe/src/main/java/org/apache/impala/catalog/HdfsTable.java:

Line 54: import org.apache.impala.common.InternalException;
not needed


http://gerrit.cloudera.org:8080/#/c/5939/5/tests/custom_cluster/test_parquet_timestamp_compatibility.py
File tests/custom_cluster/test_parquet_timestamp_compatibility.py:

Line 25: class TestParquetTimestampCompatibility(CustomClusterTestSuite):
Test needs a comment. In particular, what is covered here and what is covered in integration
tests not part of Impala tests.


Line 26:   TEST_DB_NAME = 'timezone_test_db'
Why not use the unique_database fixture? That's the best practice.


Line 39:     self.client = impalad.service.create_beeswax_client()
I think we already have a self.client from ImpalaTestSuite.


Line 49:         "/test-warehouse/alltypesagg_hive_13_1_parquet/alltypesagg_hive_13_1.parquet"
We need the appropriate filesystem prefix here. This will break on local fs or S3 builds.
Take a look at:

filesystem_utils.py get_fs_path()

and how that is used elsewhere


Line 56:     self.client.execute('USE ' + self.TEST_DB_NAME)
Why? Better to avoid changing the session state


Line 59:     self.client.execute('INVALIDATE METADATA')
why?


Line 61:   def _set_impala_table_timezone(self, timezone):
simplify to pass table name as param


Line 71:   @pytest.mark.execute_serially
How long does this test take? I'm thinking we should only have minimal tests for the impalad
startup option in a custom cluster test, probably in the existing test for convert_legacy_hive_parquet_utc_timestamps.


The other tests should go into a regular test (subclass of ImpalaTestSuite) and run in parallel.


Line 81:     statement = '''ALTER TABLE hive_table
Should be an analyzer test in AnalyzeDDL


Line 104:     select_from_impala_table = '''SELECT timestamp_col FROM impala_table WHERE id
= 1'''
Can you think of a way to validate all values in the table in bulk e.g. using our existing
timestamp conversion functions?
Having a few example values is still good, but we should also test that we are internally
consistent with out conversion functions.


Line 113:     self._set_impala_table_timezone('EST')
Add some comments about the behavior here. We are getting different values for the same timezone
because the underlying Parquet files were written by Hive/Impala.


Line 141:     statement = '''ALTER TABLE hive_table
analyzer test is more appropriate (also fix elsewhere)


Line 185:   def test_ddl(self, vector):
We need a test where Hive sets a garbage timezone and Impala throws an error when analyzing
a query against that table.

We also need a test that shows what happens when a table property timezone is set and the
convert_legacy_hive_parquet_utc_timestamps option is used. Probably good to consolidate with
the existing custom cluster test.


-- 
To view, visit http://gerrit.cloudera.org:8080/5939
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I3f24525ef45a2814f476bdee76655b30081079d6
Gerrit-PatchSet: 5
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Attila Jeges <attilaj@cloudera.com>
Gerrit-Reviewer: Alex Behm <alex.behm@cloudera.com>
Gerrit-Reviewer: Attila Jeges <attilaj@cloudera.com>
Gerrit-Reviewer: Michael Ho
Gerrit-Reviewer: Taras Bobrovytsky <tbobrovytsky@cloudera.com>
Gerrit-Reviewer: Zoltan Ivanfi <zi+gerrit@cloudera.com>
Gerrit-HasComments: Yes

Mime
View raw message