impala-reviews mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Bharath Vissapragada (Code Review)" <ger...@cloudera.org>
Subject [Impala-ASF-CR] IMPALA-4172/IMPALA-3653: Improvements to block metadata loading
Date Wed, 30 Nov 2016 02:40:36 GMT
Bharath Vissapragada has posted comments on this change.

Change subject: IMPALA-4172/IMPALA-3653: Improvements to block metadata loading
......................................................................


Patch Set 11:

(40 comments)

http://gerrit.cloudera.org:8080/#/c/5148/11/fe/src/main/java/org/apache/impala/catalog/DiskIdMapper.java
File fe/src/main/java/org/apache/impala/catalog/DiskIdMapper.java:

Line 29:  * ids. This is internally implemented as a static object to share the storageIDs
across
> Several sentences Starting with "This", sometimes not clear what "This" ref
Done


Line 44:     // sequential 0-based integer disk id used by the hdfs scanners. This assumes
that
> used by the BE scanners.
Done


Line 46:     private ConcurrentHashMap<String, Integer> storageIdToInt =
> storageUuidToDiskId
Done


Line 50:     // with the corresponding latest 0-based integer ID. Given a storage UUID, we
first
> with -> to
Done


Line 51:     // look in 'storageIdtoInt' map if it already exists, else we generate a new
ID for
> no need for this second sentence, that's what we do in getDiskId()
Done


Line 64:       if (Strings.isNullOrEmpty(storageUuid)) return diskId;
> let's move this check to the caller and require the storageUuid to be non-n
Done


Line 68:       // across the cluster, it is possible that we have a good hit rate.
> it is possible -> we expect to have a good hit rate.
Done


Line 72:         // At this point, it could be possible due to race, that storageId might've
been
> // Mapping might have been added by another thread that entered the synchro
Done


Line 76:         // No mapping exists, create a new disk ID for 'storageId'
> stoprageUuid
Done


Line 80:           // Host hasn't been populated yet. Start with a 0-based id.
> // First disk id of this host.
Done


http://gerrit.cloudera.org:8080/#/c/5148/11/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
File fe/src/main/java/org/apache/impala/catalog/HdfsTable.java:

Line 289:    * given directory 'dirPath'. Only blocks corresponding to paths from 'partsByPath'
> Mention that the block metadata is effectively dropped and re-created.
Done


Line 307:       RemoteIterator<LocatedFileStatus> fileStatusIter = fs.listFiles(dirPath,
true);
> move right before the while loop
Done


Line 308:       // Reset the current state of partitions under dirPath from perPartitionFileDescMap_
> Reset -> Clear
Done


Line 310:       for (Path partDir: partsByPath.keySet()) {
> Iterate over the entrySet(), you are using an unnecessary get(partDir) belo
Done


Line 321:         String fileName = fileStatus.getPath().getName();
> move closer to where it's used, i.e. right above L336
Done


Line 322:         if (!FileSystemUtil.isValidPartitionFile(fileStatus)) continue;
> isValidDataFile()
Done


Line 323:         // At this point, file status points to a valid file under the directory.
Now we
> // Find the partition that this file belongs (if any).
Done


Line 325:         Path partPathDir = fileStatus.getPath().getParent();
> move closer to where it's used, i.e. L357
It is used in the next 2,3 lines? Moved partPathDirName to line before the loop.


Line 330:         // Skip if this file doesn't correspond to any of the existing partitions.
> Skip if this file does not belong to any known partition.
Done


Line 332:           LOG.debug("File " + fileStatus.getPath().toString() + " doesn't correspond
" +
> add if to check log level
Done


Line 333:               " to a valid partition. Skipping metadata load for this file.");
> to a known partition
Done


Line 361:         Preconditions.checkState(perPartitionFileDescMap_.containsKey(partPathDirName));
> avoid duplicate lookup with get() + Preconditions.checkNotNull()
Done


Line 363:         // Update the partitions metadata corresponding to this file descriptor
> partitions' metadata that this file belongs to.
Done


Line 371:         LOG.warn("Unknown disk id count for filesystem " + fs + ":" + unknownDiskIdCount);
> add if to check log level
Done


Line 374:       throw new RuntimeException("Error loading block md for directory "
> block metadata
Done


Line 380:    * Loads the disk IDs for BlockLocation 'location' and its corresponding file
block.
> Comment should mention the storage UUID to disk id mapping.
Done


Line 386:     String[] hosts;
> Do you know if these are the unique hosts? I'm wondering if storageIds and 
Earlier code had a if block for this. Per my understanding HDFS doesn't place more than one
copy of the same block on the same machine. Still, I think its good to have it for debugging.
I'm adding it with an error message.


Line 394:     for (int i =0; i < storageIds.length; ++i) {
> int i = 0; (space after "=")
Done


Line 410:     for (Path partPath: partsByPath.keySet()) {
> iterate over Map.Entry, you are doing a get() in the next line
Done


Line 412:       // For each valid file in the partPath, synthesize the block metadata.
> For each data file in the partPath...
Done


Line 427:         // We assume all the partitions corresponding to the same partition path,
have
> For the purpose of synthesizing block metadata, we assume that all partitio
Done


Line 447:           perPartitionFileDescMap_.put(partition.getLocation(), fileDescMap);
> move put() inside the if (fileDescMap == null) block
Done


Line 673:    * Create HdfsPartition objects corresponding to 'msPartitions' and add them to
this
> Briefly list the high-level steps, in particular that we first create the p
Done. Let me know if there can be any improvement.


Line 742:         // Check if the partition directory is a child dir of tbl directory location.
> remove, comment only states what the code does
Done


Line 752:     // Load the block metadata
> remove comment (doesn't add anything)
Done


Line 764:       throw new CatalogException("Error loading block location md for partition
" +
> md -> metadata
Done


Line 827:   public HdfsPartition createPartition(StorageDescriptor storageDescriptor,
> change this to createAndLoadPartition()? then we can get rid of the flag in
Done


http://gerrit.cloudera.org:8080/#/c/5148/11/fe/src/main/java/org/apache/impala/common/FileSystemUtil.java
File fe/src/main/java/org/apache/impala/common/FileSystemUtil.java:

Line 276:    * Returns true if the file corresponding to 'fileStatus' is a valid file as per
> is a valid data file
Done


Line 278:    * directory or a hidden file starting with . or _ or if it is a LZO index file.
> hidden file is enough, no need to mention specific prefixes
Done


Line 291:   public static boolean supportsBlockLocationApi(FileSystem fs) {
> supportsBlockLocations()?
I'm fine with either. Actually BlockLocation is supported by generic FileSystem interface.
So I think that isn't perfect either but wfm.


-- 
To view, visit http://gerrit.cloudera.org:8080/5148
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ie127658172e6e70dae441374530674a4ac9d5d26
Gerrit-PatchSet: 11
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Bharath Vissapragada <bharathv@cloudera.com>
Gerrit-Reviewer: Alex Behm <alex.behm@cloudera.com>
Gerrit-Reviewer: Bharath Vissapragada <bharathv@cloudera.com>
Gerrit-Reviewer: Mostafa Mokhtar <mmokhtar@cloudera.com>
Gerrit-HasComments: Yes

Mime
View raw message