impala-reviews mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Alex Behm (Code Review)" <ger...@cloudera.org>
Subject [Impala-ASF-CR] IMPALA-4172/IMPALA-3653: Improvements to block metadata loading
Date Tue, 29 Nov 2016 06:01:15 GMT
Alex Behm has posted comments on this change.

Change subject: IMPALA-4172/IMPALA-3653: Improvements to block metadata loading
......................................................................


Patch Set 4:

(6 comments)

Responding to comments. I'll wait for some of the larger changes before doing another round.

http://gerrit.cloudera.org:8080/#/c/5148/4/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
File fe/src/main/java/org/apache/impala/catalog/HdfsTable.java:

Line 351:       Map<String, Map<String, FileDescriptor>> newFileDescMap = Maps.newHashMap();
> I did that initially but the problem happens when one of the files is delet
Before, we used to want to reuse as much of the existing block metadata as possible since
that was an expensive operation. However, with this new design, we are already loading all
the metadata from HDFS, so there is no need to optimize this case by not reloading the metadata
(we are loading it anyway). So I think this can be simplified to just discarding the entries
from the global map and updating the global map in place. We also don't need any of the "should
we reload the metadata" logic.


Line 386:             String[] blockHostPorts = loc.getNames();
> We are using them in preconditions check in L396 and loop condition in L403
The code and comments around these is very sparse on content, so I think we should just condense
it. This method looks really long but really it is just very "fluffy", so I was hoping to
make it more content dense.


Line 431:           numHdfsFiles_ += fileDescs.size();
> I think its due to the current design. The reload is implemented as dropPar
Got it, makes sense.


Line 483:         for (HdfsPartition partition: partitions) {
> Although it looks like O(N^2), it isn't. 'partitions' correspond to each pa
Ahh, of course. You are right. I missed the local partitions var.


Line 934:     boolean isMarkedCached = isMarkedCached_;
> Yep, wanted to discuss about this. I'll ping you offline.
Looks like we used to force reloading the block locations of cached partitions. As discussed
in a comment above, the optimization of only reloading the metadata for some files doesn't
seem to make sense anymore, so forcing the reload based on caching doesn't make anymore, either.


http://gerrit.cloudera.org:8080/#/c/5148/4/fe/src/main/java/org/apache/impala/common/FileSystemUtil.java
File fe/src/main/java/org/apache/impala/common/FileSystemUtil.java:

Line 423:     if (p.isRoot()) return false;
> That returns true for isChildPath(root, root)?
Good point. You are right.


-- 
To view, visit http://gerrit.cloudera.org:8080/5148
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ie127658172e6e70dae441374530674a4ac9d5d26
Gerrit-PatchSet: 4
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Bharath Vissapragada <bharathv@cloudera.com>
Gerrit-Reviewer: Alex Behm <alex.behm@cloudera.com>
Gerrit-Reviewer: Bharath Vissapragada <bharathv@cloudera.com>
Gerrit-Reviewer: Mostafa Mokhtar <mmokhtar@cloudera.com>
Gerrit-HasComments: Yes

Mime
View raw message