impala-reviews mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Vuk Ercegovac (Code Review)" <ger...@cloudera.org>
Subject [Impala-ASF-CR] IMPALA-5429: Multi threaded block metadata loading
Date Wed, 11 Oct 2017 18:51:23 GMT
Vuk Ercegovac has posted comments on this change. ( http://gerrit.cloudera.org:8080/8235 )

Change subject: IMPALA-5429: Multi threaded block metadata loading
......................................................................


Patch Set 5:

(9 comments)

http://gerrit.cloudera.org:8080/#/c/8235/5/be/src/catalog/catalog.cc
File be/src/catalog/catalog.cc:

http://gerrit.cloudera.org:8080/#/c/8235/5/be/src/catalog/catalog.cc@35
PS5, Line 35: DEFINE_int32(num_metadata_loading_threads, 16,
            :     "(Advanced) The number of metadata loading threads (degree of parallelism)
to use "
            :     "when loading catalog metadata.");
I'm confused by the commit message which talks about not loading from hms using multiple threads
and this flag which indicates that hms is loaded using multiple threads.


http://gerrit.cloudera.org:8080/#/c/8235/5/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
File fe/src/main/java/org/apache/impala/catalog/HdfsTable.java:

http://gerrit.cloudera.org:8080/#/c/8235/5/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@217
PS5, Line 217: public int loadedFiles_ = 0;
             :     public int refreshedFiles_ = 0;
             :     public int ignoredFiles_ = 0;
add comments for these-- see the question regarding refreshedFiles below, for example.


http://gerrit.cloudera.org:8080/#/c/8235/5/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@368
PS5, Line 368: for (HdfsPartition partition: partitions) partition.setFileDescriptors(
Am I misreading this or does each partition get set to the same list of newly found descriptors?


http://gerrit.cloudera.org:8080/#/c/8235/5/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@426
PS5, Line 426: new Reference<Long>(Long.valueOf(0)
why not use numUnknownDiskIds here?


http://gerrit.cloudera.org:8080/#/c/8235/5/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@431
PS5, Line 431: ++loadStats.refreshedFiles_;
does refreshedFiles mean "file blocks reloaded" or "file checked for reload and possibly reloaded"?

would be good to track how many times the if-block on L418 was entered since this method is
intended to be used when few changes are present.


http://gerrit.cloudera.org:8080/#/c/8235/5/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@433
PS5, Line 433: for (HdfsPartition partition: partitions) partition.setFileDescriptors(n
same question as in the load method.


http://gerrit.cloudera.org:8080/#/c/8235/5/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@773
PS5, Line 773: HDFS and S3
just to clarify, HdfsTable covers both hdfs table metadata as well as metadata needed for
s3?


http://gerrit.cloudera.org:8080/#/c/8235/5/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@783
PS5, Line 783: getFileSystem(CONF)
I noticed that this is called in many places in this class-- is it bc a given table can be
stored on multiple filesystems?


http://gerrit.cloudera.org:8080/#/c/8235/5/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@801
PS5, Line 801: getLoadingThreadPoolSize
can different partitions have different number of files? if so, work across threads may vary.
what's costly here: per file call, per partition call, or number of blocks per file?



-- 
To view, visit http://gerrit.cloudera.org:8080/8235
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I07eaa7151dfc4d56da8db8c2654bd65d8f808481
Gerrit-Change-Number: 8235
Gerrit-PatchSet: 5
Gerrit-Owner: Bharath Vissapragada <bharathv@cloudera.com>
Gerrit-Reviewer: Bharath Vissapragada <bharathv@cloudera.com>
Gerrit-Reviewer: Jim Apple <jbapple-impala@apache.org>
Gerrit-Reviewer: Vuk Ercegovac <vercegovac@cloudera.com>
Gerrit-Comment-Date: Wed, 11 Oct 2017 18:51:23 +0000
Gerrit-HasComments: Yes

Mime
  • Unnamed multipart/alternative (inline, 8-Bit, 0 bytes)
View raw message