impala-reviews mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Attila Jeges (Code Review)" <>
Subject [Impala-ASF-CR] IMPALA-1670,IMPALA-4141: Support multiple partitions in ALTER TABLE ADD PARTITION
Date Wed, 23 Nov 2016 18:05:12 GMT
Attila Jeges has posted comments on this change.

Change subject: IMPALA-1670,IMPALA-4141: Support multiple partitions in ALTER TABLE ADD PARTITION

Patch Set 20:


I'm sorry it took this long to answer your comments. I've worked on other stuff for a couple
of weeks.
File fe/src/main/java/org/apache/impala/analysis/

Line 51:       p.setTableName(tableName);
> style: extra after 'partitions')
What do you mean? It is an underscore not a space after 'partitions' and there is only one
space after the closing ')'

PS19, Line 73: TAlterTableParams params = super.toThrift();
             :     params.setAlter_type(TAlterTableType.ADD_PARTITION);
> I think you can simply use TAlterTableParams.AddToPartition_params(). It al
File fe/src/main/java/org/apache/impala/analysis/

Line 31
> Confusing leading sentence (what's a partial SQL statement?). How about som

Line 36
> "Params" is pretty generic and could easily be confused with other "Partiti

Line 38
> nit: order members as in the SQL clause, i.e, spec, loc, cacheOp
File fe/src/main/java/org/apache/impala/analysis/

Line 208
> toCanonicalString()?

PS19, Line 211: 
> nit: this is just for comparison purposes, no need to pretty print (save a 
File fe/src/main/java/org/apache/impala/service/

Line 1767:       if (!iterator.hasNext()) {
> Do we even need this step of checking the partitions against the local (pos
This step is necessary because in L1813-1817 we want to add only those partitions to catalog_
that are not in there yet.

Consider a scenario when a partition exists in catalog_ but it does not exist in HMS yet.
If we skip this check in L1767, then addHdfsPartition() in L1816 would add the partition to
catalog_ again. Note that addHdfsPartition() does not check for duplicate partitions and it
will add the same partition again to catalog_ if we don't filter out duplicates beforehand.

Line 1793:     org.apache.hadoop.hive.metastore.api.Table msTbl = tbl.getMetaStoreTable().deepCopy();
> Please file a JIRA to adhere to MAX_PARTITION_UPDATES_PER_RPC in this batch
Created IMPALA-4524 and added the check to AlterTableAddPartitionStmt.analyze()

Line 1801:         String partitionSpecStr = Joiner.on(", ").join(partitionSpec);
> Weird that we have two round trips to the HMS here. Is it reasonable to get
In alterTableCachePartitions() we call HdfsCachingUtil.submitCachePartitionDirective() for
each partition that needs to be cached. This submits a new caching directive for the specified
cache pool name, path and replication. It also adds the resulting directive ID to the corresponding
partition. HdfsCachingUtil.submitCachePartitionDirective() should be called for newly added
partitions only; that is, partitions that did not exist in catalog_ or HMS before.

If we want to get everything done in a single add_partitions() call, the HdfsCachingUtil.submitCachePartitionDirective()
calls have to be made before add_partitions() because we need to know the directive IDs. Unfortunately
at that point we don't know yet which partitions are preexisting (in HMS) and which are not
and thus we might end-up submitting cache directives for the wrong path.

I've added a short comment.

Line 1806:         LOG.debug(String.format("Skipping partition creation because (%s) already
" +
> I understand that this is necessary, but it's weird in combination with L17
We need the check in L1767 because in L1813-1817 we should add only those partitions to catalog_
that are not in there yet to avoid duplicates.

In L1767 we check if there are any conflicts with partitions stored in catalog_. In L1807
we check if there are any conflicts with partitions stored in HMS.

In L1807-1811 we are loading those partitions from HMS that exist in HMS, but don't exist
in catalog_ yet. We have to load these partitions because we want to add them to catalog_
in L1813-1817.

PS19, Line 1810: 
> Let's try to be a bit more defensive here. If getPartiionsFromHms() throws 
I've changed the wording of ImpalaRuntimeException in L1850

Line 1846: 
> Fetch them in bulk using getPartitionsByNames()
File testdata/workloads/functional-query/queries/QueryTest/alter-table.test:

PS19, Line 963: ---- TYPES
> You don't need the drop table statements. Plz remove
The drop table statements were necessary because all my test cases used i1670_alter table
and I wanted to avoid conflicts.

I've rewritten my test cases to use a unique table name and removed the drop table statements.

PS19, Line 987: Invalid value for table property
> Isn't this an analysis exception? I believe it will be an ImpalaRuntimeExce
It is an AnalysisException exception. I get the same exception when "IF NOT EXISTS" is not
used. I've added another test to verify this.

PS19, Line 990: # IMPALA-1740: Test setting the skip.header.line.count tblproperty
              : create table i1740_alter_4 (i1 integer) stored as parquet;
              : alter table i1740_alter_4 set tblproperties ('skip.header.line.count'='2');
              : ====
> The query that follows this comment does not use the IF NOT EXISTS clause w
File testdata/workloads/functional-query/queries/QueryTest/grant_revoke.test:

PS19, Line 242: # IMPALA-1670: Revoke privileges, so that user would not have privileges to
> nit: long line
File tests/metadata/

PS19, Line 733: data
> nit: with_data

To view, visit
To unsubscribe, visit

Gerrit-MessageType: comment
Gerrit-Change-Id: Iddbc951f2931f488f7048c9780260f6b49100750
Gerrit-PatchSet: 20
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Attila Jeges <>
Gerrit-Reviewer: Alex Behm <>
Gerrit-Reviewer: Attila Jeges <>
Gerrit-Reviewer: Bharath Vissapragada <>
Gerrit-Reviewer: Dimitris Tsirogiannis <>
Gerrit-Reviewer: Lars Volker <>
Gerrit-Reviewer: Michael Ho <>
Gerrit-HasComments: Yes

View raw message