impala-reviews mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Alex Behm (Code Review)" <ger...@cloudera.org>
Subject [Impala-ASF-CR] IMPALA-4172: Switch to BlockLocation methods for disk IDs
Date Fri, 04 Nov 2016 21:49:00 GMT
Alex Behm has posted comments on this change.

Change subject: IMPALA-4172: Switch to BlockLocation methods for disk IDs
......................................................................


Patch Set 1:

(11 comments)

Some high-level comments before digging in deeper.

http://gerrit.cloudera.org:8080/#/c/4914/1//COMMIT_MSG
Commit Message:

Line 9: This change enables Impala to use BlockLocation#getStorageIds,
nit: we usually use () after names to denote they are e function/method

i.e. getStorageIds -> getStorageIds()

getFileBlockLocations -> getFileBlockLocations()


Line 12: getFileBlockLocations call which will be  deprecated in Hadoop-3.
extra space after "be"


http://gerrit.cloudera.org:8080/#/c/4914/1/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
File fe/src/main/java/org/apache/impala/catalog/HdfsTable.java:

Line 287:       clazz = null;
remove


Line 290:     if ( clazz != null ) {
style: extra spaces, we do

if (clazz != null) {
...
}


Line 295:         m = null;
remove


Line 311:     LOG.info("SUPPORTS_STORAGE_ID is: " + (SUPPORTS_STORAGE_ID ? "true" : "false"));
no need for this ternary logic, just:

LOG.info("SUPPORTS_STORAGE_ID is: " + SUPPORTS_STORAGE_ID);


Line 352:   private static int getSequentialDiskId(String storageId)
While this solution is minimally invasive, I'm not a big fan of the extra locking and memory
consumption of this map.

An overall better design is to convert the UUID to a 128-bit int and use that everywhere.
Also see my comments below.

Let me know what you think.


Line 353:   {
move to previous line


Line 379:   private static int getDiskId(String storageId) {
As much as I'd like to use this simpler alternative, I think it could lead to serious performance
problems that could be very hard to debug. I don't think we can use this.

Perhaps we should consider switching to a 128-bit integer to represent the UUID everywhere
(including all thrift structures and the BE). The BE only needs a number that it can mod against
the number of local data dirs (see DiskIoMgr::AssignQueue()).

Using a 128-bit int would also work with the old volume id API.

We have an existing TUniqueId which may be suitable.


Line 437:       BlockLocation[] locations = fs.getFileBlockLocations(file, 0, file.getLen());
still using the old API?

My understanding is that switching to a new API for these calls here will make the extra loadDiskIds()
unnecessary.


Line 526:       if (SUPPORTS_STORAGE_ID) {
This seems fine as a first cut, but we should consider taking this a step further in a follow-on
change.

This is still grabbing the storage ids partition-by-partition, but ideally we could get all
the blocks and their locations for an entire table. We'd need to handle partitions with non-standard
locations specially.


-- 
To view, visit http://gerrit.cloudera.org:8080/4914
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ibbff94cef9a9db7b3945f8e7b0286866d2cc3b61
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Laszlo Gaal <laszlo.gaal@cloudera.com>
Gerrit-Reviewer: Alex Behm <alex.behm@cloudera.com>
Gerrit-Reviewer: Laszlo Gaal <laszlo.gaal@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <tarmstrong@cloudera.com>
Gerrit-HasComments: Yes

Mime
View raw message