impala-reviews mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Dimitris Tsirogiannis (Code Review)" <ger...@cloudera.org>
Subject [Impala-ASF-CR] IMPALA-1670: Support multiple partitions in ALTER TABLE ADD PARTITION
Date Wed, 14 Sep 2016 18:38:19 GMT
Dimitris Tsirogiannis has posted comments on this change.

Change subject: IMPALA-1670: Support multiple partitions in ALTER TABLE ADD PARTITION
......................................................................


Patch Set 8:

(15 comments)

Flushing out some comments. Haven't looked at the tests yet.

http://gerrit.cloudera.org:8080/#/c/4144/8/fe/src/main/java/com/cloudera/impala/analysis/AlterTableAddPartitionStmt.java
File fe/src/main/java/com/cloudera/impala/analysis/AlterTableAddPartitionStmt.java:

PS8, Line 20: import java.util.List;
            : import java.util.Set;
nit: let's try to keep the import sorted if possible (move them below L 33)


PS8, Line 59: if (getDb() != null) {
            :       sb.append(getDb() + ".");
            :     }
single line


http://gerrit.cloudera.org:8080/#/c/4144/8/fe/src/main/java/com/cloudera/impala/analysis/PartitionSpec.java
File fe/src/main/java/com/cloudera/impala/analysis/PartitionSpec.java:

PS8, Line 3: import java.util.List;
           : import java.util.Set;
           : import java.util.ArrayList;
           : import java.util.Collections;
           : import java.util.Comparator;
nit: unordered imports


PS8, Line 189: represantation
typo: representation


http://gerrit.cloudera.org:8080/#/c/4144/8/fe/src/main/java/com/cloudera/impala/service/CatalogOpExecutor.java
File fe/src/main/java/com/cloudera/impala/service/CatalogOpExecutor.java:

PS8, Line 375: // If 'IfNotExists' is false and there is a conflict with the partitions that
             :           // already exist in Hive, then:
             :           // - leave existing partitions intact
             :           // - don't add any new partitions to the table
             :           // - throw ImpalaRuntimeException.
             :           // If 'IfNotExists' is true, conflicts are hadled as follows:
             :           // - If some (but not all) partitions already exist in Hive, then
ignore
             :           // the partitions that already exist and add the rest.
             :           // - If all of the partitions already exist, ignore them and return
null.
I believe this comment section is almost duplicate to the one in alterTableAddPartitions function.
You can remove this comment block.


PS8, Line 1642: alterTableCachePartitions
can you swap this function with alterTableAddPartitions? (top down reading of the code usually
helps :))


PS8, Line 1686: hmsPartitions
Will this try to alter all the partitions even though only one has a cache op specified? Also,
again you're using hmsPartitions which is the result of add_partitions call. What if the partition
exists in HMS, not in cache but the ALTER statement indicates that the added partition should
be cached. I am not sure the end result will be the expected one in this case (i.e. the partition
being added in catalog cache and a cache directive being added). Can you plz verify and/or
add tests?


PS8, Line 1696: Hive
nit: replace Hive with HMS


PS8, Line 1718: hmsPartitionValues2CachingOp
An alternative for hmsPartitionValues2CachingOp would be partitionCachingOpMap.


PS8, Line 1718: ImmutableList
Why using ImmutableLists here? These variables are never exposed outside this function and
there are no concurrent accesses here (you hold a lock on the tbl).


PS8, Line 1720:  
nit: extra space


Line 1727:       } else {
You may add a "continue;" at the end of the if block and skip the "else"


PS8, Line 1733: ImmutableList.copyOf(hmsPartition.getValues()
Partitions have unique ids, you don't need to use strings constructed from the partition values.


PS8, Line 1745: catch (AlreadyExistsException e) {
              :         throw new ImpalaRuntimeException(
              :             String.format(HMS_RPC_ERROR_FORMAT_STR, "add_partitions"), e);
              :       } catch (TException e) {
              :         throw new ImpalaRuntimeException(
              :             String.format(HMS_RPC_ERROR_FORMAT_STR, "add_partitions"), e);
              :       }
You don't really treat these two exceptions differently. Since Java 7, you may use a multi-catch
to eliminate duplicate code:
catch (AlreadyExistsException | TException e) {...}. Also how about the other exceptions that
his call may throw? Are only these two?


PS8, Line 1766: for (Partition partition: hmsPartitions) {
              :         // Create and add the HdfsPartition. Return the table object with
an updated
              :         // catalog version.
              :         addHdfsPartition(tbl, partition);
              :       }
I think there is a scenario that may result in an inconsistent state. Say you want to add
3 partitions p1, p2 and p3 but someone (e.g. HIVE) has already added p2 without Impala knowing
about it (i.e. it's not in the cache). What will happen with your code is that the result
of add_partitions call in L1743 will return only p1 and p3 (added partitions) and these are
the partitions you will add to the catalog cache (L1766-1769). Even though the stmt will succeed
Impala will be unaware of p2 and there is nothing to indicate what went wrong. The only option
will be to run refresh or invalidate.


-- 
To view, visit http://gerrit.cloudera.org:8080/4144
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Iddbc951f2931f488f7048c9780260f6b49100750
Gerrit-PatchSet: 8
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Attila Jeges <attilaj@cloudera.com>
Gerrit-Reviewer: Attila Jeges <attilaj@cloudera.com>
Gerrit-Reviewer: Bharath Vissapragada <bharathv@cloudera.com>
Gerrit-Reviewer: Dimitris Tsirogiannis <dtsirogiannis@cloudera.com>
Gerrit-Reviewer: Lars Volker <lv@cloudera.com>
Gerrit-Reviewer: Michael Ho <kwho@cloudera.com>
Gerrit-HasComments: Yes

Mime
View raw message