impala-reviews mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Alex Behm (Code Review)" <>
Subject [Impala-ASF-CR] [PREVIEW] IMPALA-5538: Use explicit catalog versions for deleted objects
Date Tue, 22 Aug 2017 22:44:06 GMT
Alex Behm has posted comments on this change.

Change subject: [PREVIEW] IMPALA-5538: Use explicit catalog versions for deleted objects

Patch Set 1:


Very helpful explanation in the commit msg! Approach looks sane. Some high-level comments.
File be/src/catalog/

Line 288:         BuildTopicUpdates(catalog_objects.updated_objects, false);
Is there a requirement/assumption that the deletions must come after the updates?

Line 333:             << catalog_object.catalog_version;
File be/src/catalog/catalog-server.h:

Line 154:   /// determine items that have been deleted, it saves the set of topic entry keys
Update comment

Line 164:       bool topic_deletions);
What does this param do?
File be/src/scheduling/

Line 481:   item.key = host.ip;
Use __set fn like above to be consistent
File be/src/scheduling/

Line 141:   if (delta.is_delta && delta.topic_entries.empty()) {
single line
File be/src/service/

Line 1323:       TCatalogObject catalog_object;
Please move this right before DeserializeThriftMsg() so we can see where it's used/populated

Line 1343:                      << item.key << " is "

Line 1367:         LOG(INFO) << "Deleted item: " << item.key << " version:
" << catalog_object.catalog_version;
That's a lot of logging, sure we need this at INFO level?

Line 1398:             if (existing_object.catalog_version <= catalog_object.catalog_version)
Shouldn't this be < and not <=?

My understanding is that we should never have a deletion entry with exactly the same version
as an existing object, so might even add a DCHECK for the == case

Line 1527:       if (item.deleted) {
Is it possible that within a single list of topic_entries there is both an addition and a
deletion for the same thing? Which one wins?
File be/src/statestore/

Line 485:             item.value.empty() ? Statestore::TopicEntry::NULL_VALUE : item.value,
Why not just 'item.value'? The value gets copied.

Line 527:           deleted_key_size_bytes -= itr->first.size();
+= ?
File common/thrift/StatestoreService.thrift:

Line 97:   5: required bool is_delta;
fix member numbering
File fe/src/main/java/org/apache/impala/catalog/

Line 56:   private Map<String, Long> removedCatalogObjectVersionsByKey_ = Maps.newHashMap();
I might be wrong, but I think we can end up having the same object deleted multiple times
with different versions. Consider several add/drop of the same object between a statestore
topic update.

In the current scheme, I believe some objects may never get garbage collected because the
object/version relationship is not 1:1.

Line 70:    * key 'key'.
update comment

Line 163:         return "FUNCTION:" + catalogObject.getFn().getName() + "(" +
the function name has db and name, let's print db.fnname similar to what we do for a table

Line 173:       default: return null;
File fe/src/main/java/org/apache/impala/catalog/

Line 152:   // Log of deleted catalog objects.
What exactly does this contain? When is it garbage collected? Add comment here or in class

Line 349:   // versions >= 'fromVersion'. The catalog objects that satisfy that condition
are added
the version condition here is ">=" but the deltaLog_.retrieveObjects() uses ">"

Line 358:       deltaLog_.deleteRemovedObject(catalogDb);
Is the deleteRemovedObject() API really needed? If there exists an object that also has an
entry in the delta log, then the version in the delta log is definitely < currentCatalogVersion
so the deltaLog_.garbageCollect() in L341 should naturally purge those invalid entries.

The deleteRemovedObject() API seems to complicate the delta log and would be great if we did
not need it.
File fe/src/main/java/org/apache/impala/service/

Line 1084:     resp.result.setVersion(dataSource.getCatalogVersion());
Shouldn't we increment the catalog version for a deleted object? Otherwise we may have the
same object with the same version be both deleted and not deleted. Seems easier to keep incrementing
the version when anything changes.

To view, visit
To unsubscribe, visit

Gerrit-MessageType: comment
Gerrit-Change-Id: I93cb7a033dc8f0d3e0339394b36affe14523274c
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Dimitris Tsirogiannis <>
Gerrit-Reviewer: Alex Behm <>
Gerrit-HasComments: Yes

View raw message