impala-reviews mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Attila Jeges (Code Review)" <ger...@cloudera.org>
Subject [Impala-ASF-CR] IMPALA-1670,IMPALA-4141: Support multiple partitions in ALTER TABLE ADD PARTITION
Date Wed, 23 Nov 2016 18:05:12 GMT
Attila Jeges has posted comments on this change.

Change subject: IMPALA-1670,IMPALA-4141: Support multiple partitions in ALTER TABLE ADD PARTITION
......................................................................


Patch Set 20:

(18 comments)

I'm sorry it took this long to answer your comments. I've worked on other stuff for a couple
of weeks.

http://gerrit.cloudera.org:8080/#/c/4144/19/fe/src/main/java/org/apache/impala/analysis/AlterTableAddPartitionStmt.java
File fe/src/main/java/org/apache/impala/analysis/AlterTableAddPartitionStmt.java:

Line 51:       p.setTableName(tableName);
> style: extra after 'partitions')
What do you mean? It is an underscore not a space after 'partitions' and there is only one
space after the closing ')'


PS19, Line 73: TAlterTableParams params = super.toThrift();
             :     params.setAlter_type(TAlterTableType.ADD_PARTITION);
> I think you can simply use TAlterTableParams.AddToPartition_params(). It al
Done


http://gerrit.cloudera.org:8080/#/c/4144/19/fe/src/main/java/org/apache/impala/analysis/PartitionParams.java
File fe/src/main/java/org/apache/impala/analysis/PartitionParams.java:

Line 31
> Confusing leading sentence (what's a partial SQL statement?). How about som
Done


Line 36
> "Params" is pretty generic and could easily be confused with other "Partiti
Done


Line 38
> nit: order members as in the SQL clause, i.e, spec, loc, cacheOp
Done


http://gerrit.cloudera.org:8080/#/c/4144/19/fe/src/main/java/org/apache/impala/analysis/PartitionSpec.java
File fe/src/main/java/org/apache/impala/analysis/PartitionSpec.java:

Line 208
> toCanonicalString()?
Done


PS19, Line 211: 
> nit: this is just for comparison purposes, no need to pretty print (save a 
Done


http://gerrit.cloudera.org:8080/#/c/4144/19/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
File fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java:

Line 1767:       if (!iterator.hasNext()) {
> Do we even need this step of checking the partitions against the local (pos
This step is necessary because in L1813-1817 we want to add only those partitions to catalog_
that are not in there yet.

Consider a scenario when a partition exists in catalog_ but it does not exist in HMS yet.
If we skip this check in L1767, then addHdfsPartition() in L1816 would add the partition to
catalog_ again. Note that addHdfsPartition() does not check for duplicate partitions and it
will add the same partition again to catalog_ if we don't filter out duplicates beforehand.


Line 1793:     org.apache.hadoop.hive.metastore.api.Table msTbl = tbl.getMetaStoreTable().deepCopy();
> Please file a JIRA to adhere to MAX_PARTITION_UPDATES_PER_RPC in this batch
Created IMPALA-4524 and added the check to AlterTableAddPartitionStmt.analyze()


Line 1801:         String partitionSpecStr = Joiner.on(", ").join(partitionSpec);
> Weird that we have two round trips to the HMS here. Is it reasonable to get
In alterTableCachePartitions() we call HdfsCachingUtil.submitCachePartitionDirective() for
each partition that needs to be cached. This submits a new caching directive for the specified
cache pool name, path and replication. It also adds the resulting directive ID to the corresponding
partition. HdfsCachingUtil.submitCachePartitionDirective() should be called for newly added
partitions only; that is, partitions that did not exist in catalog_ or HMS before.

If we want to get everything done in a single add_partitions() call, the HdfsCachingUtil.submitCachePartitionDirective()
calls have to be made before add_partitions() because we need to know the directive IDs. Unfortunately
at that point we don't know yet which partitions are preexisting (in HMS) and which are not
and thus we might end-up submitting cache directives for the wrong path.

I've added a short comment.


Line 1806:         LOG.debug(String.format("Skipping partition creation because (%s) already
" +
> I understand that this is necessary, but it's weird in combination with L17
We need the check in L1767 because in L1813-1817 we should add only those partitions to catalog_
that are not in there yet to avoid duplicates.

In L1767 we check if there are any conflicts with partitions stored in catalog_. In L1807
we check if there are any conflicts with partitions stored in HMS.

In L1807-1811 we are loading those partitions from HMS that exist in HMS, but don't exist
in catalog_ yet. We have to load these partitions because we want to add them to catalog_
in L1813-1817.


PS19, Line 1810: 
> Let's try to be a bit more defensive here. If getPartiionsFromHms() throws 
I've changed the wording of ImpalaRuntimeException in L1850


Line 1846: 
> Fetch them in bulk using getPartitionsByNames()
Done


http://gerrit.cloudera.org:8080/#/c/4144/19/testdata/workloads/functional-query/queries/QueryTest/alter-table.test
File testdata/workloads/functional-query/queries/QueryTest/alter-table.test:

PS19, Line 963: ---- TYPES
> You don't need the drop table statements. Plz remove
The drop table statements were necessary because all my test cases used i1670_alter table
and I wanted to avoid conflicts.

I've rewritten my test cases to use a unique table name and removed the drop table statements.


PS19, Line 987: Invalid value for table property skip.header.li
> Isn't this an analysis exception? I believe it will be an ImpalaRuntimeExce
It is an AnalysisException exception. I get the same exception when "IF NOT EXISTS" is not
used. I've added another test to verify this.


PS19, Line 990: # IMPALA-1740: Test setting the skip.header.line.count tblproperty
              : create table i1740_alter_4 (i1 integer) stored as parquet;
              : alter table i1740_alter_4 set tblproperties ('skip.header.line.count'='2');
              : ====
> The query that follows this comment does not use the IF NOT EXISTS clause w
Done


http://gerrit.cloudera.org:8080/#/c/4144/19/testdata/workloads/functional-query/queries/QueryTest/grant_revoke.test
File testdata/workloads/functional-query/queries/QueryTest/grant_revoke.test:

PS19, Line 242: # IMPALA-1670: Revoke privileges, so that user would not have privileges to
run ALTER
> nit: long line
Done


http://gerrit.cloudera.org:8080/#/c/4144/19/tests/metadata/test_hms_integration.py
File tests/metadata/test_hms_integration.py:

PS19, Line 733: data
> nit: with_data
Done


-- 
To view, visit http://gerrit.cloudera.org:8080/4144
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Iddbc951f2931f488f7048c9780260f6b49100750
Gerrit-PatchSet: 20
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Attila Jeges <attilaj@cloudera.com>
Gerrit-Reviewer: Alex Behm <alex.behm@cloudera.com>
Gerrit-Reviewer: Attila Jeges <attilaj@cloudera.com>
Gerrit-Reviewer: Bharath Vissapragada <bharathv@cloudera.com>
Gerrit-Reviewer: Dimitris Tsirogiannis <dtsirogiannis@cloudera.com>
Gerrit-Reviewer: Lars Volker <lv@cloudera.com>
Gerrit-Reviewer: Michael Ho <kwho@cloudera.com>
Gerrit-HasComments: Yes

Mime
View raw message