impala-reviews mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Alex Behm (Code Review)" <>
Subject [Impala-ASF-CR] IMPALA-4172: Switch to BlockLocation methods for disk IDs
Date Fri, 04 Nov 2016 21:49:00 GMT
Alex Behm has posted comments on this change.

Change subject: IMPALA-4172: Switch to BlockLocation methods for disk IDs

Patch Set 1:


Some high-level comments before digging in deeper.
Commit Message:

Line 9: This change enables Impala to use BlockLocation#getStorageIds,
nit: we usually use () after names to denote they are e function/method

i.e. getStorageIds -> getStorageIds()

getFileBlockLocations -> getFileBlockLocations()

Line 12: getFileBlockLocations call which will be  deprecated in Hadoop-3.
extra space after "be"
File fe/src/main/java/org/apache/impala/catalog/

Line 287:       clazz = null;

Line 290:     if ( clazz != null ) {
style: extra spaces, we do

if (clazz != null) {

Line 295:         m = null;

Line 311:"SUPPORTS_STORAGE_ID is: " + (SUPPORTS_STORAGE_ID ? "true" : "false"));
no need for this ternary logic, just:"SUPPORTS_STORAGE_ID is: " + SUPPORTS_STORAGE_ID);

Line 352:   private static int getSequentialDiskId(String storageId)
While this solution is minimally invasive, I'm not a big fan of the extra locking and memory
consumption of this map.

An overall better design is to convert the UUID to a 128-bit int and use that everywhere.
Also see my comments below.

Let me know what you think.

Line 353:   {
move to previous line

Line 379:   private static int getDiskId(String storageId) {
As much as I'd like to use this simpler alternative, I think it could lead to serious performance
problems that could be very hard to debug. I don't think we can use this.

Perhaps we should consider switching to a 128-bit integer to represent the UUID everywhere
(including all thrift structures and the BE). The BE only needs a number that it can mod against
the number of local data dirs (see DiskIoMgr::AssignQueue()).

Using a 128-bit int would also work with the old volume id API.

We have an existing TUniqueId which may be suitable.

Line 437:       BlockLocation[] locations = fs.getFileBlockLocations(file, 0, file.getLen());
still using the old API?

My understanding is that switching to a new API for these calls here will make the extra loadDiskIds()

Line 526:       if (SUPPORTS_STORAGE_ID) {
This seems fine as a first cut, but we should consider taking this a step further in a follow-on

This is still grabbing the storage ids partition-by-partition, but ideally we could get all
the blocks and their locations for an entire table. We'd need to handle partitions with non-standard
locations specially.

To view, visit
To unsubscribe, visit

Gerrit-MessageType: comment
Gerrit-Change-Id: Ibbff94cef9a9db7b3945f8e7b0286866d2cc3b61
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Laszlo Gaal <>
Gerrit-Reviewer: Alex Behm <>
Gerrit-Reviewer: Laszlo Gaal <>
Gerrit-Reviewer: Tim Armstrong <>
Gerrit-HasComments: Yes

View raw message