impala-reviews mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Dimitris Tsirogiannis (Code Review)" <>
Subject [Impala-ASF-CR] [PREVIEW] IMPALA-5538: Use explicit catalog versions for deleted objects
Date Fri, 08 Sep 2017 05:41:38 GMT
Dimitris Tsirogiannis has posted comments on this change.

Change subject: [PREVIEW] IMPALA-5538: Use explicit catalog versions for deleted objects

Patch Set 1:

File be/src/catalog/

Line 288:         BuildTopicUpdates(catalog_objects.updated_objects, false);
> Is there a requirement/assumption that the deletions must come after the up
No, it doesn't matter.

Line 300: void CatalogServer::BuildTopicUpdates(const vector<TCatalogObject>& catalog_objects,
> IMO, it is more readable if the signature of BuildTopicUpdates takes TGetAl
These two "do something" blocks are almost identical. I refactored the code a bit to make
it a bit less flaky.

Line 333:             << catalog_object.catalog_version;
> indent
File be/src/catalog/catalog-server.h:

Line 154:   /// determine items that have been deleted, it saves the set of topic entry keys
> Update comment

Line 164:       bool topic_deletions);
> What does this param do?
Updated the comment.
File be/src/scheduling/

Line 481:   item.key = host.ip;
> Use __set fn like above to be consistent
File be/src/scheduling/

Line 141:   if (delta.is_delta && delta.topic_entries.empty()) {
> single line
File be/src/service/

Line 1323:       TCatalogObject catalog_object;
> Please move this right before DeserializeThriftMsg() so we can see where it

Line 1343:                      << item.key << " is "
> indentation

Line 1367:         LOG(INFO) << "Deleted item: " << item.key << " version:
" << catalog_object.catalog_version;
> That's a lot of logging, sure we need this at INFO level?
Sorry, that was for debugging.

Line 1398:             if (existing_object.catalog_version <= catalog_object.catalog_version)
> Shouldn't this be < and not <=?

PS1, Line 1406: batch_size_bytes
> Looks like we didn't account this earlier for deletions. This logic makes m

Line 1527:       if (item.deleted) {
> Is it possible that within a single list of topic_entries there is both an 
No, it's not possible to have both an addition and a deletion in the same list of topic_entries.
File be/src/statestore/

Line 485:             item.value.empty() ? Statestore::TopicEntry::NULL_VALUE : item.value,
> Why not just 'item.value'? The value gets copied.
Good point. Done

Line 527:           deleted_key_size_bytes -= itr->first.size();
> += ?
File common/thrift/StatestoreService.thrift:

Line 92
> - I kind of like this separation of updates and deletions in different list
By switching to TTopicItem for topic deletions there is now significant overlap between operations
performed on both added and deleted items. We could try to extract most of the common logic
in separate functions and keep the existing logic in place, but for now it seems simple enough
to have one loop over one list of items and separate the logic with an if statement when needed.
If we had to add many of these if statements in multiple places, I'd agree that splitting
the logic would have been better.

Line 97:   5: required bool is_delta;
> fix member numbering
File fe/src/main/java/org/apache/impala/catalog/

Line 56:   private Map<String, Long> removedCatalogObjectVersionsByKey_ = Maps.newHashMap();
> I might be wrong, but I think we can end up having the same object deleted 
Good catch, that was a bug indeed. Removed the second map and simplified the logic around
handling the deleted objects.

Line 70:    * key 'key'.
> update comment
No longer applicable. Function is removed.

Line 90:   public synchronized void garbageCollect(long currentCatalogVersion) {
> Although these methods are synchronized, the underlying maps aren't thread 
That's not correct. The essence of having synchronized functions is exactly that, to ensure
proper synchronization even though the underlying collections are not thread safe. Two threads
calling garbageCollect and addRemoved cannot both proceed, one will grab the lock on the CatalogDeltaLog
object and make progress while the other will simply block. In any case, this class is simplified
and it only contains one collection.

Line 154:   private String toCatalogObjectKey(TCatalogObject catalogObject) {
> static?

Line 163:         return "FUNCTION:" + catalogObject.getFn().getName() + "(" +
> the function name has db and name, let's print db.fnname similar to what we
Db is already included in the function name.

Line 173:       default: return null;
> IllegalStateException?
File fe/src/main/java/org/apache/impala/catalog/

Line 152:   // Log of deleted catalog objects.
> Yea, I'm trying to wrap my head around the lifecycle of delta log on a the 

Line 152:   // Log of deleted catalog objects.
> What exactly does this contain? When is it garbage collected? Add comment h

Line 349:   // versions >= 'fromVersion'. The catalog objects that satisfy that condition
are added
> the version condition here is ">=" but the deltaLog_.retrieveObjects() uses

Line 358:       deltaLog_.deleteRemovedObject(catalogDb);
> Is the deleteRemovedObject() API really needed? If there exists an object t
File fe/src/main/java/org/apache/impala/service/

Line 1084:     resp.result.setVersion(dataSource.getCatalogVersion());
> Shouldn't we increment the catalog version for a deleted object? Otherwise 
That already happens in catalog_.removeDataSource(). We're not really consistent in the way
we handle version increment and assignment (i.e. it happens in multiple places today) and
this is something that we will eventually have to fix.

To view, visit
To unsubscribe, visit

Gerrit-MessageType: comment
Gerrit-Change-Id: I93cb7a033dc8f0d3e0339394b36affe14523274c
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Dimitris Tsirogiannis <>
Gerrit-Reviewer: Alex Behm <>
Gerrit-Reviewer: Bharath Vissapragada <>
Gerrit-Reviewer: Dimitris Tsirogiannis <>
Gerrit-HasComments: Yes

View raw message