impala-reviews mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Dan Hecht (Code Review)" <>
Subject [Impala-ASF-CR] IMPALA-5538: Use explicit catalog versions for deleted objects
Date Fri, 15 Sep 2017 23:31:58 GMT
Dan Hecht has posted comments on this change.

Change subject: IMPALA-5538: Use explicit catalog versions for deleted objects

Patch Set 5:

File be/src/catalog/

PS5, Line 231:   // If this is not a delta update, request an update from version 0 from the
             :   // catalog. There is an optimization that checks if pending_topic_updates_
was just
             :   // reloaded from version 0. If so, then skip this step and use that data.
I found it impossible to understand this comment without reading through most of the code
in this file, which kinda defeats the point of having a comment :) i think it would be clearer
if it said something like:

If not generating a delta update and 'pending_topic_updates_' doesn't already contain the
full catalog (beginning with version 0), then force GatherCatalogUpdatesThread() to reload
the full catalog.

PS5, Line 234: delta.from_version == 0 && delta.to_version == 0
from the comment above, presumably this condition means "this is not a delta update"?  Why
is that the condition and not 'delta.is_delta'?

PS5, Line 310: if (catalog_object.catalog_version <= last_sent_catalog_version_) continue;
given that last_sent_catalog_version_ was passed to GetAllCatalogObjects(), why is this needed?
File be/src/service/

PS5, Line 1392: 
can we delete this function now?
File be/src/statestore/

PS5, Line 177:     entry_it->second.SetDeleted(true);
             :     entry_it->second.SetVersion(last_version_);
now that some subscribes require deleted entries to have a value, how does this all work?
are we implicitly requiring that only non-transient subscribers depend on values for deletions?
 If that's the case, that seems fragile (and at the least should be documented).

PS5, Line 490:  
would be easier to read if you line break it there to keep the last arg all on one line.

PS5, Line 538: if (topic_entry.value() != Statestore::TopicEntry::NULL_VALUE)
why do we need that guard? why can't we set topic_item.value even in that case?
assuming we don't need it, it seems like we can remove the NULL_VALUE constant altogether.

PS5, Line 546:         int64_t topic_size = topic.total_key_size_bytes() - deleted_key_size_bytes
             :             + topic.total_value_size_bytes();
that math doesn't look correct anymore (deleted entries can have non-zero value size).  it
might be more straight forward to just add up the topic size as we go rather than try to subtract
out the deleted portion.
File be/src/statestore/statestore.h:

PS5, Line 180: may
may or will?
File common/thrift/CatalogInternalService.thrift:

PS5, Line 25: GetAllCatalogObjects
see below. I think we should rename this.

PS5, Line 25: Contains all known Catalog objects
            : // (databases, tables/views, and functions) from the CatalogService's cache.
I think that should be updated to explain that this is relative to a particular catalog version
and that it now expresses it as an update and delete.

PS5, Line 38: empty list if no objects detected in the Catalog
> Yes, there is some context that is missing here, hence it's not easy to und
Thanks. As we talked about in person, I think we should rename GetAllCatalogObjects to GetCatalogDelta()
or something similar to make it clear that this gets the delta. There's no hint at that until
you read the header file comment.
File common/thrift/StatestoreService.thrift:

Line 84:   // is not included in the topic key (e.g. catalog version of deleted catalog objects).
> Let me try to explain. Statestore only uses that flag to avoid sending dele
Please see my comments elsewhere in this file for suggestions on clarifying the interface.

It still seems confusing and weird that 'value' can matter for delete when delete, but I guess
the comments I'm suggested elsewhere help clarify at least what 'delete' actually means.

PS5, Line 87: If true, this item was deleted
How about:

If true, this item was deleted.  When false, this TTopicItem need not be included in non-delta
TTopicDelta's (since the complete set of TTopicItems will be included in that case).

or something like that.

PS5, Line 97: List of changes to topic entries, including deletions.
That's only true when is_delta==true.  So, how about we say:

When is_delta=true, a list of changes to topic entries, including deletions, within [from_version,
When is_delta=false, this is the list of all non-delete topic entries for [0, to_version],
which can be used to reconstruct the topic from scratch.

PS5, Line 100: applied to in-memory state,
that's kinda dependent on the subscriber's implementation, right? would it be clearer to say:

relative to the topic at versions [0, from_version).

or something like that?

To view, visit
To unsubscribe, visit

Gerrit-MessageType: comment
Gerrit-Change-Id: I93cb7a033dc8f0d3e0339394b36affe14523274c
Gerrit-PatchSet: 5
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Dimitris Tsirogiannis <>
Gerrit-Reviewer: Alex Behm <>
Gerrit-Reviewer: Bharath Vissapragada <>
Gerrit-Reviewer: Dan Hecht <>
Gerrit-Reviewer: Dimitris Tsirogiannis <>
Gerrit-Reviewer: Vuk Ercegovac <>
Gerrit-HasComments: Yes

View raw message