Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id D7B3E200BCC for ; Tue, 29 Nov 2016 18:00:29 +0100 (CET) Received: by cust-asf.ponee.io (Postfix) id D6E29160AFC; Tue, 29 Nov 2016 17:00:29 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id 2C434160B15 for ; Tue, 29 Nov 2016 18:00:29 +0100 (CET) Received: (qmail 39906 invoked by uid 500); 29 Nov 2016 17:00:28 -0000 Mailing-List: contact reviews-help@impala.incubator.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Delivered-To: mailing list reviews@impala.incubator.apache.org Received: (qmail 39874 invoked by uid 99); 29 Nov 2016 17:00:28 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd1-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 29 Nov 2016 17:00:28 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd1-us-west.apache.org (ASF Mail Server at spamd1-us-west.apache.org) with ESMTP id 93988C699F for ; Tue, 29 Nov 2016 17:00:27 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd1-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 0.362 X-Spam-Level: X-Spam-Status: No, score=0.362 tagged_above=-999 required=6.31 tests=[RDNS_DYNAMIC=0.363, SPF_PASS=-0.001] autolearn=disabled Received: from mx1-lw-us.apache.org ([10.40.0.8]) by localhost (spamd1-us-west.apache.org [10.40.0.7]) (amavisd-new, port 10024) with ESMTP id 4-pwd2MRXYtx for ; Tue, 29 Nov 2016 17:00:25 +0000 (UTC) Received: from ip-10-146-233-104.ec2.internal (ec2-75-101-130-251.compute-1.amazonaws.com [75.101.130.251]) by mx1-lw-us.apache.org (ASF Mail Server at mx1-lw-us.apache.org) with ESMTPS id 78E2F5F649 for ; Tue, 29 Nov 2016 17:00:25 +0000 (UTC) Received: from localhost (localhost [127.0.0.1]) by ip-10-146-233-104.ec2.internal (8.14.4/8.14.4) with ESMTP id uATH0EKf006274; Tue, 29 Nov 2016 17:00:14 GMT Message-Id: <201611291700.uATH0EKf006274@ip-10-146-233-104.ec2.internal> Date: Tue, 29 Nov 2016 17:00:13 +0000 From: "Bharath Vissapragada (Code Review)" To: impala-cr@cloudera.com, reviews@impala.incubator.apache.org CC: Mostafa Mokhtar , Alex Behm Reply-To: bharathv@cloudera.com X-Gerrit-MessageType: newpatchset Subject: =?UTF-8?Q?=5BImpala-ASF-CR=5D_IMPALA-4172/IMPALA-3653=3A_Improvements_to_block_metadata_loading=0A?= X-Gerrit-Change-Id: Ie127658172e6e70dae441374530674a4ac9d5d26 X-Gerrit-ChangeURL: X-Gerrit-Commit: 888af047188629afe5886587fb9bdcbfceecc641 In-Reply-To: References: MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Content-Disposition: inline User-Agent: Gerrit/2.12.2 archived-at: Tue, 29 Nov 2016 17:00:30 -0000 Bharath Vissapragada has uploaded a new patch set (#9). Change subject: IMPALA-4172/IMPALA-3653: Improvements to block metadata loading ...................................................................... IMPALA-4172/IMPALA-3653: Improvements to block metadata loading This patch improves the block metadata loading (locations and disk storage IDs) for partitioned and un-partitioned tables in the Catalog server. Without this patch: ------------------ We loop throuh each and every file in the table/partition directories and call getFileBlockLocations() on it to obtain the block metadata. This results in large no. of RPC calls to namenode, especially with tables with large no. of files/partitions. With this patch: --------------- We move the block metadata querying to use listStatus() call which accepts a directory as input and fetches the 'BlockLocation' objects for every file recursively in that directory. This improves the metadata loading in the following ways. - For non-partitioned tables, we query all the BlockLocations in a single RPC call in the base table directory and load the corresponding disk IDs. - For partitioned tables, we query the BlockLocations for all the partitions residing under the base table directories in a single RPC and then load every partition with non-default partition directory separately. Also, this patch does away with VolumeIds returned by the HDFS NN and uses the new StorageIDs returned by the BlockLocation class. These StorageIDs are UUID strings and hence are mapped to a per-node 0-based index as expected by the backend. In the upcoming versions of Hadoop APIs, getFileBlockStorageLocations() is deprecated and instead the listStatus() returns BlockLocations with storage IDs embedded. This patch makes use of this improvement to reduce an additional RPC to the NN to fetch the storage locations. Change-Id: Ie127658172e6e70dae441374530674a4ac9d5d26 --- A fe/src/main/java/org/apache/impala/catalog/DiskIdMapper.java M fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java M fe/src/main/java/org/apache/impala/common/FileSystemUtil.java 4 files changed, 371 insertions(+), 266 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/48/5148/9 -- To view, visit http://gerrit.cloudera.org:8080/5148 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ie127658172e6e70dae441374530674a4ac9d5d26 Gerrit-PatchSet: 9 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Bharath Vissapragada Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Mostafa Mokhtar