impala-reviews mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Bharath Vissapragada (Code Review)" <>
Subject [Impala-ASF-CR] IMPALA-4172/IMPALA-3653: Improvements to block metadata loading
Date Sat, 26 Nov 2016 23:03:05 GMT
Bharath Vissapragada has uploaded a new patch set (#3).

Change subject: IMPALA-4172/IMPALA-3653: Improvements to block metadata loading

IMPALA-4172/IMPALA-3653: Improvements to block metadata loading

This patch improves the block metadata loading (locations and disk
storage IDs) for partitioned and un-partitioned tables in the Catalog

Without this patch:
We loop throuh each and every file in the table/partition directories
and call getFileBlockLocations() on it to obtain the block metadata.
This results in large no. of RPC calls to namenode, especially with
tables with large no. of files/partitions.

With this patch:
We move the block metadata querying to use listStatus() call which
accepts a directory as input and fetches the 'BlockLocation' objects
for every file recursively in that directory. This improves the
metadata loading in the following ways.

- For non-partitioned tables, we query all the BlockLocations in a
single RPC call in the base table directory and load the corresponding
disk IDs.

- For partitioned tables, we query the BlockLocations for all the
partitions residing under the base table directories in a single RPC
and then load every partition with non-default partition directory

Also, this patch does away with VolumeIds returned by the HDFS NN
and uses the new StorageIDs returned by the BlockLocation class.
These StorageIDs are UUID strings and hence are mapped to a
per-node 0-based index as expected by the backend. In the upcoming
versions of Hadoop APIs, getFileBlockStorageLocations() is deprecated
and instead the listStatus() returns BlockLocations with storage IDs
embedded. This patch makes use to improvement to reduce an addition
RPC to NN to fetch the storage locations.

Change-Id: Ie127658172e6e70dae441374530674a4ac9d5d26
A fe/src/main/java/org/apache/impala/catalog/
M fe/src/main/java/org/apache/impala/catalog/
M fe/src/main/java/org/apache/impala/common/
3 files changed, 385 insertions(+), 250 deletions(-)

  git pull ssh:// refs/changes/48/5148/3
To view, visit
To unsubscribe, visit

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ie127658172e6e70dae441374530674a4ac9d5d26
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Bharath Vissapragada <>
Gerrit-Reviewer: Alex Behm <>
Gerrit-Reviewer: Bharath Vissapragada <>
Gerrit-Reviewer: Mostafa Mokhtar <>

View raw message