impala-reviews mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Dimitris Tsirogiannis (Code Review)" <ger...@cloudera.org>
Subject [Impala-ASF-CR] IMPALA-1670,IMPALA-4141: Support multiple partitions in ALTER TABLE ADD PARTITION
Date Wed, 28 Sep 2016 23:55:20 GMT
Dimitris Tsirogiannis has posted comments on this change.

Change subject: IMPALA-1670,IMPALA-4141: Support multiple partitions in ALTER TABLE ADD PARTITION
......................................................................


Patch Set 14:

(14 comments)

Very nice! Minor nits in the code, some comments in testing.

http://gerrit.cloudera.org:8080/#/c/4144/14/fe/src/main/java/com/cloudera/impala/service/CatalogOpExecutor.java
File fe/src/main/java/com/cloudera/impala/service/CatalogOpExecutor.java:

PS14, Line 1629: .
" or null if the table is not altered because all the partitions already exist and IF NOT
EXISTS is specified".


PS14, Line 1635: If all partitions exist in
               :    * catalog cache, return null.
remove


PS14, Line 1714: It is assumed that all Partition objects in 'aList' and 'bList' belong to
the same
               :    * table.
I don't think this function is making use or enforcing this assumption anywhere. So, you may
want to remove this sentence.


PS14, Line 1753: lists of partition values
nit: maybe it's better to say "maps partitions (identified by their partition values) to their
corresponding HDFS caching ops."


http://gerrit.cloudera.org:8080/#/c/4144/14/fe/src/test/java/com/cloudera/impala/analysis/AnalyzeDDLTest.java
File fe/src/test/java/com/cloudera/impala/analysis/AnalyzeDDLTest.java:

PS14, Line 220: AnalysisError("alter table functional.alltypes add " + cl +
              :           " partition(year=2050, month=10)" +
              :           " partition(year=2050, month=11)" +
              :           " partition(Month=10, YEAR=2050) location" +
              :           " 'hdfs://localhost:20500/test-warehouse/alltypes/year=2010/month=10'",
              :           "Duplicate partition spec: (month=10, year=2050)");
I think this case simply enforces the unique partition specs (similar to the one in L215).
So I am not sure if LOCATION adds anything more in terms of test coverage in this case. Maybe
just keep the previous case?


Line 246:     }
Also you may want to add a test case where one of the partitions is cached in a non-existent
pool.


http://gerrit.cloudera.org:8080/#/c/4144/14/tests/common/impala_test_suite.py
File tests/common/impala_test_suite.py:

PS14, Line 172: def impala_partitions(self, table_name, *include_fields):
A few thoughts on this function:
1. The field positions seem to be hardcoded based on the current output of SHOW PARTITIONS.
Any change in the output will break the tests that rely on this function. 
2. "impala_partitions" doesn't really indicate what to expect from this function. Maybe "get_impala_partitions"
or "get_impala_partition_info"  is more appropriate.
3. I find it awkward that by default it returns the partition values and all other fields
are optional. Personally, I'd change it so that, if include_fields is not specified, return
all fields. Otherwise, return only those fields in 'include_fields'.


PS14, Line 174: Impala sees them
maybe: "returned by a SHOW PARTITIONS statement."


http://gerrit.cloudera.org:8080/#/c/4144/14/tests/metadata/test_hms_integration.py
File tests/metadata/test_hms_integration.py:

Line 269:     Passes 'command' to 'engine' callable and makes sure that it raises an exception.
Thanks for adding the comment. Maybe an example of an 'engine' callable (in terms of an existing
Python class that can be used) would also help future readers understand how to use this.


Line 701:             "Partition already exists")
You may want to add an assert here to verify that no partitions were added to Impala.


PS14, Line 713: was left alone
Even partitions need to be left alone from time to time :). Maybe "was not modified."


PS14, Line 714: # Sometimes metadata load is triggered here, so compare only the first three
              :         # returned partitions.
Not sure I follow this comment. Can you explain the non-deterministic behavior wrt metadata
load?


PS14, Line 770: def test_partitions_exist_after_refresh(self, vector):
I don't think this is an integration test. You may want to move it to test_partition_metadata.py
and also cover the 'invalidate metadata' case. Also, another integration test case that is
currently not covered is adding partitions using Impala and checking that they are accessible
through Hive.


Line 800:             table_name).get_data().split('\n')
Also try to add some new and existing partitions using Impala (maybe with location and cached
properties) and verify that new partitions are added correctly and existing partitions are
not modified. You may want to test w and w/o IF NOT EXISTS.


-- 
To view, visit http://gerrit.cloudera.org:8080/4144
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Iddbc951f2931f488f7048c9780260f6b49100750
Gerrit-PatchSet: 14
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Attila Jeges <attilaj@cloudera.com>
Gerrit-Reviewer: Attila Jeges <attilaj@cloudera.com>
Gerrit-Reviewer: Bharath Vissapragada <bharathv@cloudera.com>
Gerrit-Reviewer: Dimitris Tsirogiannis <dtsirogiannis@cloudera.com>
Gerrit-Reviewer: Lars Volker <lv@cloudera.com>
Gerrit-Reviewer: Michael Ho <kwho@cloudera.com>
Gerrit-HasComments: Yes

Mime
View raw message