Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id E9F0E200C0A for ; Sat, 28 Jan 2017 10:22:23 +0100 (CET) Received: by cust-asf.ponee.io (Postfix) id E89F9160B51; Sat, 28 Jan 2017 09:22:23 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id 3991C160B35 for ; Sat, 28 Jan 2017 10:22:23 +0100 (CET) Received: (qmail 18231 invoked by uid 500); 28 Jan 2017 09:22:22 -0000 Mailing-List: contact reviews-help@impala.incubator.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Delivered-To: mailing list reviews@impala.incubator.apache.org Received: (qmail 18220 invoked by uid 99); 28 Jan 2017 09:22:22 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd3-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 28 Jan 2017 09:22:22 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd3-us-west.apache.org (ASF Mail Server at spamd3-us-west.apache.org) with ESMTP id A3E1818C37F for ; Sat, 28 Jan 2017 09:22:21 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd3-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 0.363 X-Spam-Level: X-Spam-Status: No, score=0.363 tagged_above=-999 required=6.31 tests=[RDNS_DYNAMIC=0.363, SPF_PASS=-0.001, URIBL_BLOCKED=0.001] autolearn=disabled Received: from mx1-lw-eu.apache.org ([10.40.0.8]) by localhost (spamd3-us-west.apache.org [10.40.0.10]) (amavisd-new, port 10024) with ESMTP id Vi2egp8pbWLR for ; Sat, 28 Jan 2017 09:22:20 +0000 (UTC) Received: from ip-10-146-233-104.ec2.internal (ec2-75-101-130-251.compute-1.amazonaws.com [75.101.130.251]) by mx1-lw-eu.apache.org (ASF Mail Server at mx1-lw-eu.apache.org) with ESMTPS id D2CC75F39A for ; Sat, 28 Jan 2017 09:22:19 +0000 (UTC) Received: from localhost (localhost [127.0.0.1]) by ip-10-146-233-104.ec2.internal (8.14.4/8.14.4) with ESMTP id v0S9MAkB019125; Sat, 28 Jan 2017 09:22:10 GMT Message-Id: <201701280922.v0S9MAkB019125@ip-10-146-233-104.ec2.internal> Date: Sat, 28 Jan 2017 09:22:10 +0000 From: "Impala Public Jenkins (Code Review)" To: Alex Behm , impala-cr@cloudera.com, reviews@impala.incubator.apache.org X-Gerrit-MessageType: merged Subject: =?UTF-8?Q?=5BImpala-ASF-CR=5D_IMPALA-4789=3A_Fix_slow_metadata_loading_due_to_inconsistent_paths=2E=0A?= X-Gerrit-Change-Id: I8c881b7cb155032b82fba0e29350ca31de388d55 X-Gerrit-ChangeURL: X-Gerrit-Commit: 7b8ffd35534c11ae3caa048229effc97613cd34f In-Reply-To: References: MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Content-Disposition: inline User-Agent: Gerrit/2.12.2 archived-at: Sat, 28 Jan 2017 09:22:24 -0000 Impala Public Jenkins has submitted this change and it was merged. Change subject: IMPALA-4789: Fix slow metadata loading due to inconsistent paths. ...................................................................... IMPALA-4789: Fix slow metadata loading due to inconsistent paths. The fix for IMPALA-4172/IMPALA-3653 introduced a performance regression for loading tables that have many partitions with: 1. Inconsistent HDFS path qualification or 2. A custom location (not under the table root dir) For the first issue consider a table whose root path is at 'hdfs://localhost:8020/warehouse/tbl/'. A partition with an unqualified location '/warehouse/tbl/p=1' will not be recognized as being a descendant of the table root dir by FileSystemUtil.isDescendentPath() because of how Path.equals() behaves, even if 'hdfs://localhost:8020' is the default filesystem. Such partitions are incorrectly recognized as having a custom location and are loaded separately. There were two performance issues: 1. The code for loading the files/blocks of partitions with truly custom locations was inefficient with an O(N^2) loop for determining the target partitions. 2. Each partition that is incorrectly identified as having a custom path (e.g. due to inconsistent qualification), is going to have its files/blocks loaded twice. Once when the table root path is processed, and once when the 'custom' partition is processed. This patch fixes the detection of partitions with custom locations, and improves the speed of loading partitions with custom locations. Change-Id: I8c881b7cb155032b82fba0e29350ca31de388d55 Reviewed-on: http://gerrit.cloudera.org:8080/5743 Reviewed-by: Alex Behm Tested-by: Impala Public Jenkins --- M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java M fe/src/main/java/org/apache/impala/common/FileSystemUtil.java 2 files changed, 59 insertions(+), 18 deletions(-) Approvals: Impala Public Jenkins: Verified Alex Behm: Looks good to me, approved -- To view, visit http://gerrit.cloudera.org:8080/5743 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: merged Gerrit-Change-Id: I8c881b7cb155032b82fba0e29350ca31de388d55 Gerrit-PatchSet: 7 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Alex Behm Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Marcel Kornacker