impala-reviews mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Bharath Vissapragada (Code Review)" <ger...@cloudera.org>
Subject [Impala-ASF-CR] IMPALA-4172/IMPALA-3653: Improvements to block metadata loading
Date Sat, 26 Nov 2016 23:02:24 GMT
Bharath Vissapragada has posted comments on this change.

Change subject: IMPALA-4172/IMPALA-3653: Improvements to block metadata loading
......................................................................


Patch Set 2:

(16 comments)

http://gerrit.cloudera.org:8080/#/c/5148/2/fe/src/main/java/org/apache/impala/catalog/DiskIdMapper.java
File fe/src/main/java/org/apache/impala/catalog/DiskIdMapper.java:

Line 27:  * This class encapsulates the logic required for mapping HDFS
> Much fluff, simplify to: Maps HDFS storage-UUIDs to per-host 0-based, seque
Done


Line 29:  * by the backend.
> required by Impala backends
Done


Line 31: public class DiskIdMapper {
> Make this a singleton to clarify the intended use.
Done


Line 36:     // This is intentionally implemented as a static object to share the storageIDs
across
> move this part to the class comment
Done


Line 38:     // consistent across all the tables, without which, we can potentially confuse
the
> also mention that the static solution requires the least amount of memory
Done


Line 48:     private static ConcurrentHashMap<String, AtomicInteger>
> Don't think we need a ConcurrentHashMap or AtomicInteger here, because when
Used a stricter locking scheme, but as we discussed offline, this can still have contention
in the initial metadata load. Added a TODO for that.


http://gerrit.cloudera.org:8080/#/c/5148/2/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
File fe/src/main/java/org/apache/impala/catalog/HdfsTable.java:

Line 294:   // Keeps track of all the parition paths that are filtered to add block metadata.
> this comment style for classes and functions
Oops, done.


Line 368:         if (!pathMd.filteredParitionPaths.contains(fileStatus.getPath())) continue;
> Looks like this contains() and the indexOf() below can become very expensiv
The new PS design revamps the whole thing. We don't do indexOf() searches anymore.


Line 423:     for (int i=0; i < fileMd.pathFds.size(); ++i) {
> i = 0 (add spaces)
Done


Line 458:       // Only build the storage IDs fs instances that support the BlockLocation
API.
> ... for fs instances ...
Done


Line 720:   private void loadAllPartitions(
> After reading through this a few times I'm still confused about the high-le
The new PS implements a similar logic. We can sync-up on that if required.


Line 777:         if (dirsToLoad.contains(partDir) ||
> positive is more readable to me
Done


http://gerrit.cloudera.org:8080/#/c/5148/2/fe/src/main/java/org/apache/impala/common/FileSystemUtil.java
File fe/src/main/java/org/apache/impala/common/FileSystemUtil.java:

Line 275:    * Returns true if the filesystem supports BlockLocation API.
> What API is it specifically? LocatedFileStatus.getBlockLocations()?
Yes, all methods returning 'BlockLocation's.


Line 277:   public static boolean supportsBlockLocationAPI(FileSystem fs) {
> supportsBlockLocationApi
Done


Line 406:   public static boolean isPathChild(Path p1, Path p2) {
> isChildPath(Path p, Path parent)
Done


Line 408:     Path parentDir = p1.getParent();
> I think this can be improved by using the Path.getDepth(). You can do getPa
good idea. Done.


-- 
To view, visit http://gerrit.cloudera.org:8080/5148
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ie127658172e6e70dae441374530674a4ac9d5d26
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Bharath Vissapragada <bharathv@cloudera.com>
Gerrit-Reviewer: Alex Behm <alex.behm@cloudera.com>
Gerrit-Reviewer: Bharath Vissapragada <bharathv@cloudera.com>
Gerrit-Reviewer: Mostafa Mokhtar <mmokhtar@cloudera.com>
Gerrit-HasComments: Yes

Mime
View raw message