impala-reviews mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Vihang Karajgaonkar (Code Review)" <ger...@cloudera.org>
Subject [Impala-ASF-CR] IMPALA-3127: Support incremental metadata updates in partition level
Date Thu, 23 Jul 2020 19:05:35 GMT
Vihang Karajgaonkar has posted comments on this change. ( http://gerrit.cloudera.org:8080/16159
)

Change subject: IMPALA-3127: Support incremental metadata updates in partition level
......................................................................


Patch Set 4:

(6 comments)

Thanks for the updated patch. The patch looks mostly good from my side except for few minor
suggestions below.

http://gerrit.cloudera.org:8080/#/c/16159/4/common/thrift/CatalogObjects.thrift
File common/thrift/CatalogObjects.thrift:

http://gerrit.cloudera.org:8080/#/c/16159/4/common/thrift/CatalogObjects.thrift@292
PS4, Line 292: list<Exprs.TExpr>
curious to know if it is intentional that this field is not optional? Also, adding a comment
here to say that fields are optional because THdfsPartition are minimally filled as required
when topic mode is minimal.


http://gerrit.cloudera.org:8080/#/c/16159/4/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java
File fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java:

http://gerrit.cloudera.org:8080/#/c/16159/4/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java@748
PS4, Line 748: setId
Is the delete flag really needed in this method? For instance, do you think its cleaner if
we simply set the following here irrespective of the delete flag value:

partObject.setId(obj.hdfs_partition.id);
partObject.setPrevId(obj.hdfs_partition.prev_id);

and on the local coordinator side, if the HdfsPartition topic entry has delete flag = true
we invalidate the the HdfsPartition using the id, else we invalidate the HdfsPartition using
prev_id; I understand in that case the invalidatePartition method in the CatalogMetaProvider
will have to take in the delete flag but that way we are not leaking implementation details
from local coordinator to catalog side.


http://gerrit.cloudera.org:8080/#/c/16159/4/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java@750
PS4, Line 750:           // This is an update on a new partition (!isDelete && prev_id
== -1).
             :           // Nothing to invalidate.
             :           return null
not sure if you can fully avoid this issue anyways. For instance, if a partition is updated
twice between topic updates, the second instance of the partition will have the prev_id set.


http://gerrit.cloudera.org:8080/#/c/16159/4/fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java
File fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java:

http://gerrit.cloudera.org:8080/#/c/16159/4/fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java@93
PS4, Line 93: The catalog versions are
            :  * not used actually.
Should we Override the getCatalogVersion and setCatalogVersion methods to throw UnsupportedOperationException
instead so that they are not accidently used in future?


http://gerrit.cloudera.org:8080/#/c/16159/4/fe/src/main/java/org/apache/impala/catalog/ImpaladCatalog.java
File fe/src/main/java/org/apache/impala/catalog/ImpaladCatalog.java:

http://gerrit.cloudera.org:8080/#/c/16159/4/fe/src/main/java/org/apache/impala/catalog/ImpaladCatalog.java@399
PS4, Line 399: // Ignore partition deletions in catalog-v1.
may be add more specific message here:
// we ignore partition deletions in catalog-v1 mode since they are handled during table updates
to atomically update the table.


http://gerrit.cloudera.org:8080/#/c/16159/4/tests/custom_cluster/test_local_catalog.py
File tests/custom_cluster/test_local_catalog.py:

http://gerrit.cloudera.org:8080/#/c/16159/4/tests/custom_cluster/test_local_catalog.py@187
PS4, Line 187: id=%%d
Do we have to verify the partition ids? Shouldn't the message with the partition name be sufficient?
Also, another way to confirm invalidation is be making sure the the next query is loading
the partitions again by confirming if the metric counters for the cache hit/misses in the
profile are incremented correctly.



-- 
To view, visit http://gerrit.cloudera.org:8080/16159
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ia0abfb346903d6e7cdc603af91c2b8937d24d870
Gerrit-Change-Number: 16159
Gerrit-PatchSet: 4
Gerrit-Owner: Quanlong Huang <huangquanlong@gmail.com>
Gerrit-Reviewer: Anurag Mantripragada <anurag@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <huangquanlong@gmail.com>
Gerrit-Reviewer: Tim Armstrong <tarmstrong@cloudera.com>
Gerrit-Reviewer: Vihang Karajgaonkar <vihang@cloudera.com>
Gerrit-Comment-Date: Thu, 23 Jul 2020 19:05:35 +0000
Gerrit-HasComments: Yes

Mime
  • Unnamed multipart/alternative (inline, 8-Bit, 0 bytes)
View raw message