impala-reviews mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Alex Behm (Code Review)" <ger...@cloudera.org>
Subject [Impala-ASF-CR] [PREVIEW] IMPALA-5538: Use explicit catalog versions for deleted objects
Date Tue, 22 Aug 2017 22:44:06 GMT
Alex Behm has posted comments on this change.

Change subject: [PREVIEW] IMPALA-5538: Use explicit catalog versions for deleted objects
......................................................................


Patch Set 1:

(22 comments)

Very helpful explanation in the commit msg! Approach looks sane. Some high-level comments.

http://gerrit.cloudera.org:8080/#/c/7731/1/be/src/catalog/catalog-server.cc
File be/src/catalog/catalog-server.cc:

Line 288:         BuildTopicUpdates(catalog_objects.updated_objects, false);
Is there a requirement/assumption that the deletions must come after the updates?


Line 333:             << catalog_object.catalog_version;
indent


http://gerrit.cloudera.org:8080/#/c/7731/1/be/src/catalog/catalog-server.h
File be/src/catalog/catalog-server.h:

Line 154:   /// determine items that have been deleted, it saves the set of topic entry keys
sent
Update comment


Line 164:       bool topic_deletions);
What does this param do?


http://gerrit.cloudera.org:8080/#/c/7731/1/be/src/scheduling/scheduler-test-util.cc
File be/src/scheduling/scheduler-test-util.cc:

Line 481:   item.key = host.ip;
Use __set fn like above to be consistent


http://gerrit.cloudera.org:8080/#/c/7731/1/be/src/scheduling/scheduler.cc
File be/src/scheduling/scheduler.cc:

Line 141:   if (delta.is_delta && delta.topic_entries.empty()) {
single line


http://gerrit.cloudera.org:8080/#/c/7731/1/be/src/service/impala-server.cc
File be/src/service/impala-server.cc:

Line 1323:       TCatalogObject catalog_object;
Please move this right before DeserializeThriftMsg() so we can see where it's used/populated


Line 1343:                      << item.key << " is "
indentation


Line 1367:         LOG(INFO) << "Deleted item: " << item.key << " version:
" << catalog_object.catalog_version;
That's a lot of logging, sure we need this at INFO level?


Line 1398:             if (existing_object.catalog_version <= catalog_object.catalog_version)
{
Shouldn't this be < and not <=?

My understanding is that we should never have a deletion entry with exactly the same version
as an existing object, so might even add a DCHECK for the == case


Line 1527:       if (item.deleted) {
Is it possible that within a single list of topic_entries there is both an addition and a
deletion for the same thing? Which one wins?


http://gerrit.cloudera.org:8080/#/c/7731/1/be/src/statestore/statestore.cc
File be/src/statestore/statestore.cc:

Line 485:             item.value.empty() ? Statestore::TopicEntry::NULL_VALUE : item.value,
Why not just 'item.value'? The value gets copied.


Line 527:           deleted_key_size_bytes -= itr->first.size();
+= ?


http://gerrit.cloudera.org:8080/#/c/7731/1/common/thrift/StatestoreService.thrift
File common/thrift/StatestoreService.thrift:

Line 97:   5: required bool is_delta;
fix member numbering


http://gerrit.cloudera.org:8080/#/c/7731/1/fe/src/main/java/org/apache/impala/catalog/CatalogDeltaLog.java
File fe/src/main/java/org/apache/impala/catalog/CatalogDeltaLog.java:

Line 56:   private Map<String, Long> removedCatalogObjectVersionsByKey_ = Maps.newHashMap();
I might be wrong, but I think we can end up having the same object deleted multiple times
with different versions. Consider several add/drop of the same object between a statestore
topic update.

In the current scheme, I believe some objects may never get garbage collected because the
object/version relationship is not 1:1.


Line 70:    * key 'key'.
update comment


Line 163:         return "FUNCTION:" + catalogObject.getFn().getName() + "(" +
the function name has db and name, let's print db.fnname similar to what we do for a table


Line 173:       default: return null;
IllegalStateException?


http://gerrit.cloudera.org:8080/#/c/7731/1/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java
File fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java:

Line 152:   // Log of deleted catalog objects.
What exactly does this contain? When is it garbage collected? Add comment here or in class
comment.


Line 349:   // versions >= 'fromVersion'. The catalog objects that satisfy that condition
are added
the version condition here is ">=" but the deltaLog_.retrieveObjects() uses ">"


Line 358:       deltaLog_.deleteRemovedObject(catalogDb);
Is the deleteRemovedObject() API really needed? If there exists an object that also has an
entry in the delta log, then the version in the delta log is definitely < currentCatalogVersion
so the deltaLog_.garbageCollect() in L341 should naturally purge those invalid entries.

The deleteRemovedObject() API seems to complicate the delta log and would be great if we did
not need it.


http://gerrit.cloudera.org:8080/#/c/7731/1/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
File fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java:

Line 1084:     resp.result.setVersion(dataSource.getCatalogVersion());
Shouldn't we increment the catalog version for a deleted object? Otherwise we may have the
same object with the same version be both deleted and not deleted. Seems easier to keep incrementing
the version when anything changes.


-- 
To view, visit http://gerrit.cloudera.org:8080/7731
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I93cb7a033dc8f0d3e0339394b36affe14523274c
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Dimitris Tsirogiannis <dtsirogiannis@cloudera.com>
Gerrit-Reviewer: Alex Behm <alex.behm@cloudera.com>
Gerrit-HasComments: Yes

Mime
View raw message