impala-reviews mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Bharath Vissapragada (Code Review)" <>
Subject [Impala-ASF-CR] [PREVIEW] IMPALA-5538: Use explicit catalog versions for deleted objects
Date Fri, 25 Aug 2017 01:33:53 GMT
Bharath Vissapragada has posted comments on this change.

Change subject: [PREVIEW] IMPALA-5538: Use explicit catalog versions for deleted objects

Patch Set 1:


Got some high level comments. I'm still trying to understand the life-cycle of the delta log
on the Catalog server, might need some from you :)
File be/src/catalog/

Line 300: void CatalogServer::BuildTopicUpdates(const vector<TCatalogObject>& catalog_objects,
IMO, it is more readable if the signature of BuildTopicUpdates takes TGetAllCatalogObjectsResponse
as input. Something like

BuildTopicUpdates(TGetAllCatalogObjectsResponse foo) {

  for (object: foo.updated_objects) {
     // do something

  for (object: foo.deleted_objects) {
     // do something


Expecting the callers pass the type of objects being sent seems flaky. Thoughts?
File be/src/service/

PS1, Line 1406: batch_size_bytes
Looks like we didn't account this earlier for deletions. This logic makes more sense to me.
Ofcourse it doesn't affect the end output, just an edge case when there are tons of deletions.
File common/thrift/StatestoreService.thrift:

Line 92
- I kind of like this separation of updates and deletions in different lists as the code is
more readable and obvious as to how we process each of them separately. Without that we have
additional branches in every place to check if each item is deleted, especially the logic
in ImpalaServer#CatalogUpdateCallback().

- After going through the full patch, I'm wondering what is the advantage of removing topic_deletions
totally. Why not just make it a required list<TTopicItem> topic_deletions? That adds
the version no and you needn't change much of the logic. Does that not work for some reason?
File fe/src/main/java/org/apache/impala/catalog/

Line 90:   public synchronized void garbageCollect(long currentCatalogVersion) {
Although these methods are synchronized, the underlying maps aren't thread safe. So ideally
the class as such isn't thread safe (in the sense that calling garbageCollect() and addRemoved()
objects in two threads can result in an inconsistent state).

However that scenario is not possible since these parallel ops are prevented by catalogLock_
in CatalogServiceCatalog?

Line 154:   private String toCatalogObjectKey(TCatalogObject catalogObject) {
File fe/src/main/java/org/apache/impala/catalog/

Line 152:   // Log of deleted catalog objects.
> What exactly does this contain? When is it garbage collected? Add comment h
Yea, I'm trying to wrap my head around the lifecycle of delta log on a the CatalogServer side.
It helps to add more details on how it works and why the Catalog server also needs to maintain
it after your change.

To view, visit
To unsubscribe, visit

Gerrit-MessageType: comment
Gerrit-Change-Id: I93cb7a033dc8f0d3e0339394b36affe14523274c
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Dimitris Tsirogiannis <>
Gerrit-Reviewer: Alex Behm <>
Gerrit-Reviewer: Bharath Vissapragada <>
Gerrit-HasComments: Yes

View raw message