impala-reviews mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Vuk Ercegovac (Code Review)" <>
Subject [Impala-ASF-CR] IMPALA-5429: Multi threaded block metadata loading
Date Wed, 11 Oct 2017 18:51:23 GMT
Vuk Ercegovac has posted comments on this change. ( )

Change subject: IMPALA-5429: Multi threaded block metadata loading

Patch Set 5:

File be/src/catalog/
PS5, Line 35: DEFINE_int32(num_metadata_loading_threads, 16,
            :     "(Advanced) The number of metadata loading threads (degree of parallelism)
to use "
            :     "when loading catalog metadata.");
I'm confused by the commit message which talks about not loading from hms using multiple threads
and this flag which indicates that hms is loaded using multiple threads.
File fe/src/main/java/org/apache/impala/catalog/
PS5, Line 217: public int loadedFiles_ = 0;
             :     public int refreshedFiles_ = 0;
             :     public int ignoredFiles_ = 0;
add comments for these-- see the question regarding refreshedFiles below, for example.
PS5, Line 368: for (HdfsPartition partition: partitions) partition.setFileDescriptors(
Am I misreading this or does each partition get set to the same list of newly found descriptors?
PS5, Line 426: new Reference<Long>(Long.valueOf(0)
why not use numUnknownDiskIds here?
PS5, Line 431: ++loadStats.refreshedFiles_;
does refreshedFiles mean "file blocks reloaded" or "file checked for reload and possibly reloaded"?

would be good to track how many times the if-block on L418 was entered since this method is
intended to be used when few changes are present.
PS5, Line 433: for (HdfsPartition partition: partitions) partition.setFileDescriptors(n
same question as in the load method.
PS5, Line 773: HDFS and S3
just to clarify, HdfsTable covers both hdfs table metadata as well as metadata needed for
PS5, Line 783: getFileSystem(CONF)
I noticed that this is called in many places in this class-- is it bc a given table can be
stored on multiple filesystems?
PS5, Line 801: getLoadingThreadPoolSize
can different partitions have different number of files? if so, work across threads may vary.
what's costly here: per file call, per partition call, or number of blocks per file?

To view, visit
To unsubscribe, visit

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I07eaa7151dfc4d56da8db8c2654bd65d8f808481
Gerrit-Change-Number: 8235
Gerrit-PatchSet: 5
Gerrit-Owner: Bharath Vissapragada <>
Gerrit-Reviewer: Bharath Vissapragada <>
Gerrit-Reviewer: Jim Apple <>
Gerrit-Reviewer: Vuk Ercegovac <>
Gerrit-Comment-Date: Wed, 11 Oct 2017 18:51:23 +0000
Gerrit-HasComments: Yes

  • Unnamed multipart/alternative (inline, 8-Bit, 0 bytes)
View raw message