Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id E51EC200B9D for ; Thu, 29 Sep 2016 01:55:26 +0200 (CEST) Received: by cust-asf.ponee.io (Postfix) id E38FA160AD4; Wed, 28 Sep 2016 23:55:26 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id 0D640160AD3 for ; Thu, 29 Sep 2016 01:55:25 +0200 (CEST) Received: (qmail 21029 invoked by uid 500); 28 Sep 2016 23:55:25 -0000 Mailing-List: contact reviews-help@impala.incubator.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Delivered-To: mailing list reviews@impala.incubator.apache.org Received: (qmail 21017 invoked by uid 99); 28 Sep 2016 23:55:25 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd4-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 28 Sep 2016 23:55:25 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd4-us-west.apache.org (ASF Mail Server at spamd4-us-west.apache.org) with ESMTP id 9AC3CC05C0 for ; Wed, 28 Sep 2016 23:55:24 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd4-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 0.362 X-Spam-Level: X-Spam-Status: No, score=0.362 tagged_above=-999 required=6.31 tests=[RDNS_DYNAMIC=0.363, SPF_PASS=-0.001] autolearn=disabled Received: from mx1-lw-eu.apache.org ([10.40.0.8]) by localhost (spamd4-us-west.apache.org [10.40.0.11]) (amavisd-new, port 10024) with ESMTP id NxP2SRoYPccG for ; Wed, 28 Sep 2016 23:55:22 +0000 (UTC) Received: from ip-10-146-233-104.ec2.internal (ec2-75-101-130-251.compute-1.amazonaws.com [75.101.130.251]) by mx1-lw-eu.apache.org (ASF Mail Server at mx1-lw-eu.apache.org) with ESMTPS id 117335FB09 for ; Wed, 28 Sep 2016 23:55:21 +0000 (UTC) Received: from localhost (localhost [127.0.0.1]) by ip-10-146-233-104.ec2.internal (8.14.4/8.14.4) with ESMTP id u8SNtKw4019952; Wed, 28 Sep 2016 23:55:20 GMT Message-Id: <201609282355.u8SNtKw4019952@ip-10-146-233-104.ec2.internal> Date: Wed, 28 Sep 2016 23:55:20 +0000 From: "Dimitris Tsirogiannis (Code Review)" To: Attila Jeges , impala-cr@cloudera.com, reviews@impala.incubator.apache.org CC: Bharath Vissapragada , Michael Ho , Lars Volker Reply-To: dtsirogiannis@cloudera.com X-Gerrit-MessageType: comment Subject: =?UTF-8?Q?=5BImpala-ASF-CR=5D_IMPALA-1670=2CIMPALA-4141=3A_Support_multiple_partitions_in_ALTER_TABLE_ADD_PARTITION=0A?= X-Gerrit-Change-Id: Iddbc951f2931f488f7048c9780260f6b49100750 X-Gerrit-ChangeURL: X-Gerrit-Commit: 4ba5f4977c3fb711b1c613eecfe40c0efbbd8991 In-Reply-To: References: MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Content-Disposition: inline User-Agent: Gerrit/2.12.2 archived-at: Wed, 28 Sep 2016 23:55:27 -0000 Dimitris Tsirogiannis has posted comments on this change. Change subject: IMPALA-1670,IMPALA-4141: Support multiple partitions in ALTER TABLE ADD PARTITION ...................................................................... Patch Set 14: (14 comments) Very nice! Minor nits in the code, some comments in testing. http://gerrit.cloudera.org:8080/#/c/4144/14/fe/src/main/java/com/cloudera/impala/service/CatalogOpExecutor.java File fe/src/main/java/com/cloudera/impala/service/CatalogOpExecutor.java: PS14, Line 1629: . " or null if the table is not altered because all the partitions already exist and IF NOT EXISTS is specified". PS14, Line 1635: If all partitions exist in : * catalog cache, return null. remove PS14, Line 1714: It is assumed that all Partition objects in 'aList' and 'bList' belong to the same : * table. I don't think this function is making use or enforcing this assumption anywhere. So, you may want to remove this sentence. PS14, Line 1753: lists of partition values nit: maybe it's better to say "maps partitions (identified by their partition values) to their corresponding HDFS caching ops." http://gerrit.cloudera.org:8080/#/c/4144/14/fe/src/test/java/com/cloudera/impala/analysis/AnalyzeDDLTest.java File fe/src/test/java/com/cloudera/impala/analysis/AnalyzeDDLTest.java: PS14, Line 220: AnalysisError("alter table functional.alltypes add " + cl + : " partition(year=2050, month=10)" + : " partition(year=2050, month=11)" + : " partition(Month=10, YEAR=2050) location" + : " 'hdfs://localhost:20500/test-warehouse/alltypes/year=2010/month=10'", : "Duplicate partition spec: (month=10, year=2050)"); I think this case simply enforces the unique partition specs (similar to the one in L215). So I am not sure if LOCATION adds anything more in terms of test coverage in this case. Maybe just keep the previous case? Line 246: } Also you may want to add a test case where one of the partitions is cached in a non-existent pool. http://gerrit.cloudera.org:8080/#/c/4144/14/tests/common/impala_test_suite.py File tests/common/impala_test_suite.py: PS14, Line 172: def impala_partitions(self, table_name, *include_fields): A few thoughts on this function: 1. The field positions seem to be hardcoded based on the current output of SHOW PARTITIONS. Any change in the output will break the tests that rely on this function. 2. "impala_partitions" doesn't really indicate what to expect from this function. Maybe "get_impala_partitions" or "get_impala_partition_info" is more appropriate. 3. I find it awkward that by default it returns the partition values and all other fields are optional. Personally, I'd change it so that, if include_fields is not specified, return all fields. Otherwise, return only those fields in 'include_fields'. PS14, Line 174: Impala sees them maybe: "returned by a SHOW PARTITIONS statement." http://gerrit.cloudera.org:8080/#/c/4144/14/tests/metadata/test_hms_integration.py File tests/metadata/test_hms_integration.py: Line 269: Passes 'command' to 'engine' callable and makes sure that it raises an exception. Thanks for adding the comment. Maybe an example of an 'engine' callable (in terms of an existing Python class that can be used) would also help future readers understand how to use this. Line 701: "Partition already exists") You may want to add an assert here to verify that no partitions were added to Impala. PS14, Line 713: was left alone Even partitions need to be left alone from time to time :). Maybe "was not modified." PS14, Line 714: # Sometimes metadata load is triggered here, so compare only the first three : # returned partitions. Not sure I follow this comment. Can you explain the non-deterministic behavior wrt metadata load? PS14, Line 770: def test_partitions_exist_after_refresh(self, vector): I don't think this is an integration test. You may want to move it to test_partition_metadata.py and also cover the 'invalidate metadata' case. Also, another integration test case that is currently not covered is adding partitions using Impala and checking that they are accessible through Hive. Line 800: table_name).get_data().split('\n') Also try to add some new and existing partitions using Impala (maybe with location and cached properties) and verify that new partitions are added correctly and existing partitions are not modified. You may want to test w and w/o IF NOT EXISTS. -- To view, visit http://gerrit.cloudera.org:8080/4144 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Iddbc951f2931f488f7048c9780260f6b49100750 Gerrit-PatchSet: 14 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Attila Jeges Gerrit-Reviewer: Attila Jeges Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Michael Ho Gerrit-HasComments: Yes