impala-reviews mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Alex Behm (Code Review)" <>
Subject [Impala-ASF-CR] IMPALA-1670,IMPALA-4141: Support multiple partitions in ALTER TABLE ADD PARTITION
Date Sun, 30 Oct 2016 00:49:11 GMT
Alex Behm has posted comments on this change.

Change subject: IMPALA-1670,IMPALA-4141: Support multiple partitions in ALTER TABLE ADD PARTITION

Patch Set 19:

File fe/src/main/java/org/apache/impala/analysis/

Line 51:     for (PartitionParams p: partitions_) {
style: extra after 'partitions')
File fe/src/main/java/org/apache/impala/analysis/

Line 31:  * Represents the partial SQL statement of specifying a table partition
Confusing leading sentence (what's a partial SQL statement?). How about something like this:

Represents a partition definition used in ALTER TABLE ADD PARTITION consisting of partition
key-value pairs and an optional location and optional caching options.

Line 36: public class PartitionParams implements ParseNode {
"Params" is pretty generic and could easily be confused with other "Partition*" stuff. How
about PartitionDef?

Line 38:   private final PartitionSpec partitionSpec_;
nit: order members as in the SQL clause, i.e, spec, loc, cacheOp
File fe/src/main/java/org/apache/impala/analysis/

Line 208:   public String getComparableStringRep() {
File fe/src/main/java/org/apache/impala/service/

Line 1767:       if (catalog_.containsHdfsPartition(tableName.getDb(), tableName.getTbl(),
Do we even need this step of checking the partitions against the local (possibly stale) cache?
This might cause the table to be loaded which is expensive and seems unnecessary.

We eventually have to deal with the "truly" added partitions based on the RPC response from
the HMS anyway.

Line 1793:         addedHmsPartitions = msClient.getHiveClient().add_partitions(hmsPartitionsToAdd,
Please file a JIRA to adhere to MAX_PARTITION_UPDATES_PER_RPC in this batched RPC to the HMS.
We've seen problems in the past when updating a lot of metadata in one RPC to the HMS.

For now, please enforce during analysis that one ALTER TABLE ADD PARTITION can only add at

Line 1801:       alterTableCachePartitions(msTbl, msClient, tableName, addedHmsPartitions,
Weird that we have two round trips to the HMS here. Is it reasonable to get the work done
in a single add_partitions() RPC?

If not, it would be good to leave a brief comment or TODO.

I don't think we need to hold up the patch for this, I just want to understand the reasoning/issue

Line 1806:       // catalog the partitions that already exist in HMS but aren't in the catalog
I understand that this is necessary, but it's weird in combination with L1767 where we may
just have loaded the entire table metadata, but can still be inconsistent down here. I think
that strengthens the case for getting rid of the check in L1767.

Also do we even need to load those partitions? Is the expectation that after the ALTER all
the affected partitions are loaded?

Line 1846:         part = msClient.getHiveClient().getPartition(part.getDbName(),
Fetch them in bulk using getPartitionsByNames()

To view, visit
To unsubscribe, visit

Gerrit-MessageType: comment
Gerrit-Change-Id: Iddbc951f2931f488f7048c9780260f6b49100750
Gerrit-PatchSet: 19
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Attila Jeges <>
Gerrit-Reviewer: Alex Behm <>
Gerrit-Reviewer: Attila Jeges <>
Gerrit-Reviewer: Bharath Vissapragada <>
Gerrit-Reviewer: Dimitris Tsirogiannis <>
Gerrit-Reviewer: Lars Volker <>
Gerrit-Reviewer: Michael Ho <>
Gerrit-HasComments: Yes

View raw message