impala-reviews mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Impala Public Jenkins (Code Review)" <ger...@cloudera.org>
Subject [Impala-ASF-CR] IMPALA-4789: Fix slow metadata loading due to inconsistent paths.
Date Sat, 28 Jan 2017 09:22:10 GMT
Impala Public Jenkins has submitted this change and it was merged.

Change subject: IMPALA-4789: Fix slow metadata loading due to inconsistent paths.
......................................................................


IMPALA-4789: Fix slow metadata loading due to inconsistent paths.

The fix for IMPALA-4172/IMPALA-3653 introduced a performance
regression for loading tables that have many partitions with:
1. Inconsistent HDFS path qualification or
2. A custom location (not under the table root dir)

For the first issue consider a table whose root path is at
'hdfs://localhost:8020/warehouse/tbl/'.
A partition with an unqualified location '/warehouse/tbl/p=1'
will not be recognized as being a descendant of the table root
dir by FileSystemUtil.isDescendentPath() because of how
Path.equals() behaves, even if 'hdfs://localhost:8020' is the
default filesystem. Such partitions are incorrectly recognized
as having a custom location and are loaded separately.

There were two performance issues:
1. The code for loading the files/blocks of partitions with
   truly custom locations was inefficient with an O(N^2)
   loop for determining the target partitions.
2. Each partition that is incorrectly identified as having
   a custom path (e.g. due to inconsistent qualification),
   is going to have its files/blocks loaded twice. Once
   when the table root path is processed, and once when the
   'custom' partition is processed.

This patch fixes the detection of partitions with custom
locations, and improves the speed of loading partitions
with custom locations.

Change-Id: I8c881b7cb155032b82fba0e29350ca31de388d55
Reviewed-on: http://gerrit.cloudera.org:8080/5743
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Tested-by: Impala Public Jenkins
---
M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
M fe/src/main/java/org/apache/impala/common/FileSystemUtil.java
2 files changed, 59 insertions(+), 18 deletions(-)

Approvals:
  Impala Public Jenkins: Verified
  Alex Behm: Looks good to me, approved



-- 
To view, visit http://gerrit.cloudera.org:8080/5743
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: merged
Gerrit-Change-Id: I8c881b7cb155032b82fba0e29350ca31de388d55
Gerrit-PatchSet: 7
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Alex Behm <alex.behm@cloudera.com>
Gerrit-Reviewer: Alex Behm <alex.behm@cloudera.com>
Gerrit-Reviewer: Bharath Vissapragada <bharathv@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dhecht@cloudera.com>
Gerrit-Reviewer: Dimitris Tsirogiannis <dtsirogiannis@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Marcel Kornacker <marcel@cloudera.com>

Mime
View raw message