impala-reviews mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Vuk Ercegovac (Code Review)" <ger...@cloudera.org>
Subject [Impala-ASF-CR] IMPALA-5429: Multi threaded block metadata loading
Date Mon, 16 Oct 2017 20:35:10 GMT
Vuk Ercegovac has posted comments on this change. ( http://gerrit.cloudera.org:8080/8235 )

Change subject: IMPALA-5429: Multi threaded block metadata loading
......................................................................


Patch Set 6:

(4 comments)

http://gerrit.cloudera.org:8080/#/c/8235/6/be/src/catalog/catalog.cc
File be/src/catalog/catalog.cc:

http://gerrit.cloudera.org:8080/#/c/8235/6/be/src/catalog/catalog.cc@39
PS6, Line 39: (Advanced) Number of threads used to load block metadata for HDFS based partitioned
"
            :     "tables. Due to HDFS architectural limitations, it is unlikely to get a
linear "
            :     "speed up beyond 5 threads.
When multiple tables are loaded, should I think about the total number of threads as num_metadata_loading_threads
* max_hdfs_parts_parallel_load? If so, is the scaling limitation of 5 with regards to total
threads hitting the namenode or 5 * 16 (per default settings)?


http://gerrit.cloudera.org:8080/#/c/8235/5/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
File fe/src/main/java/org/apache/impala/catalog/HdfsTable.java:

http://gerrit.cloudera.org:8080/#/c/8235/5/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@783
PS5, Line 783: numPaths) throws Ca
> Correct. This is one of the overheads as noticed in the perf runs and unfor
CONF is the default configuration and its loaded once upfront for the lifetime of this class
(L201).
I suspect few filesystems are specified-- perhaps we may get lucky and there is only one.
Potentially, there's a way to make this method cheaper for such cases?


http://gerrit.cloudera.org:8080/#/c/8235/5/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@801
PS5, Line 801: 
> Yes, each partition can have its own no. of files, so the work definitely v
yes, that answers it. might be useful to try a workload that has the same number of blocks
as your current workload, but distributed non-uniformly across partitions and files.


http://gerrit.cloudera.org:8080/#/c/8235/6/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
File fe/src/main/java/org/apache/impala/catalog/HdfsTable.java:

http://gerrit.cloudera.org:8080/#/c/8235/6/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@818
PS6, Line 818: for (Future task: pendingMdLoadTasks) 
just for my own info-- since this work is triggered by an end-user, how is cancellation dealt
with?



-- 
To view, visit http://gerrit.cloudera.org:8080/8235
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I07eaa7151dfc4d56da8db8c2654bd65d8f808481
Gerrit-Change-Number: 8235
Gerrit-PatchSet: 6
Gerrit-Owner: Bharath Vissapragada <bharathv@cloudera.com>
Gerrit-Reviewer: Bharath Vissapragada <bharathv@cloudera.com>
Gerrit-Reviewer: Dimitris Tsirogiannis <dtsirogiannis@cloudera.com>
Gerrit-Reviewer: Jim Apple <jbapple-impala@apache.org>
Gerrit-Reviewer: Mostafa Mokhtar <mmokhtar@cloudera.com>
Gerrit-Reviewer: Vuk Ercegovac <vercegovac@cloudera.com>
Gerrit-Comment-Date: Mon, 16 Oct 2017 20:35:10 +0000
Gerrit-HasComments: Yes

Mime
  • Unnamed multipart/alternative (inline, 8-Bit, 0 bytes)
View raw message