impala-reviews mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Attila Jeges (Code Review)" <ger...@cloudera.org>
Subject [Impala-ASF-CR] IMPALA-1670,IMPALA-4141: Support multiple partitions in ALTER TABLE ADD PARTITION
Date Thu, 20 Oct 2016 15:29:14 GMT
Attila Jeges has posted comments on this change.

Change subject: IMPALA-1670,IMPALA-4141: Support multiple partitions in ALTER TABLE ADD PARTITION
......................................................................


Patch Set 14:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/4144/14/tests/metadata/test_hms_integration.py
File tests/metadata/test_hms_integration.py:

PS14, Line 714: # Sometimes metadata load is triggered here, so compare only the first three
              :         # returned partitions.
> I am not sure this explanation and the fix is correct. SYNC_DDL causes a me
I did some additional investigation. You are correct that setting SYNC_DDL is not directly
related to the issue, just made it easier to reproduce.

(A) The issue can be reproduced manually with Hive and Impala shells, so this is definitely
not a bug in the python tests.

Steps to reproduce:

[0] Start Impala and Hive shells side-by-side.

[1] impala> create table ptbl (a int) partitioned by (x int);

[2] impala> show partitions ptbl;

[3] hive> alter table ptbl add partition (x=1);

[4] impala> show partitions ptbl;

[5] impala> set sync_ddl=1; (OR wait for few seconds between [6] and [7])

[6] impala> alter table ptbl add partition (x=2) cached in 'testPool';

[7] impala> show partitions ptbl;

In [2] and [4] the list of partitions is empty.

In [7] the list of partitions should contain (x=2) only, but it contains both (x=1) and (x=2).


(A.1) This behavior can be reproduced without the IMPALA-1670 related changes, so this is
not something that IMPALA-1670 introduced.

(A.2) Setting SYNC_DDL in [5] is not necessary, we get the same behavior if we wait for a
few seconds instead between [6] and [7].

(A.3) If we run [6] without any caching options, Impala works as expected and in [7] the list
of partitions contains only (x=2)


(B) In asf-gerrit/master branch (without the IMPALA-1670 change-set):

CatalogOpExecutor.alterTableAddPartition() calls CatalogServiceCatalog.watchCacheDirs() with
cache directive IDs (L1704).

The comments for CatalogServiceCatalog.watchCacheDirs() state: "Adds a list of cache directive
IDs for the given table name. Asynchronously refreshes the table metadata once all cache directives
complete."

So, this is what triggers the metadata load. How do you think I should proceed?


-- 
To view, visit http://gerrit.cloudera.org:8080/4144
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Iddbc951f2931f488f7048c9780260f6b49100750
Gerrit-PatchSet: 14
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Attila Jeges <attilaj@cloudera.com>
Gerrit-Reviewer: Attila Jeges <attilaj@cloudera.com>
Gerrit-Reviewer: Bharath Vissapragada <bharathv@cloudera.com>
Gerrit-Reviewer: Dimitris Tsirogiannis <dtsirogiannis@cloudera.com>
Gerrit-Reviewer: Lars Volker <lv@cloudera.com>
Gerrit-Reviewer: Michael Ho <kwho@cloudera.com>
Gerrit-HasComments: Yes

Mime
View raw message