Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id 63D87200BD7 for ; Sun, 27 Nov 2016 06:32:55 +0100 (CET) Received: by cust-asf.ponee.io (Postfix) id 627E7160B1A; Sun, 27 Nov 2016 05:32:55 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id ACF7B160B14 for ; Sun, 27 Nov 2016 06:32:54 +0100 (CET) Received: (qmail 28021 invoked by uid 500); 27 Nov 2016 05:32:53 -0000 Mailing-List: contact reviews-help@impala.incubator.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Delivered-To: mailing list reviews@impala.incubator.apache.org Received: (qmail 28006 invoked by uid 99); 27 Nov 2016 05:32:53 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd1-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Sun, 27 Nov 2016 05:32:53 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd1-us-west.apache.org (ASF Mail Server at spamd1-us-west.apache.org) with ESMTP id E4586C2110 for ; Sun, 27 Nov 2016 05:32:52 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd1-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 0.362 X-Spam-Level: X-Spam-Status: No, score=0.362 tagged_above=-999 required=6.31 tests=[RDNS_DYNAMIC=0.363, SPF_PASS=-0.001] autolearn=disabled Received: from mx1-lw-eu.apache.org ([10.40.0.8]) by localhost (spamd1-us-west.apache.org [10.40.0.7]) (amavisd-new, port 10024) with ESMTP id LC-WhWndFH1D for ; Sun, 27 Nov 2016 05:32:50 +0000 (UTC) Received: from ip-10-146-233-104.ec2.internal (ec2-75-101-130-251.compute-1.amazonaws.com [75.101.130.251]) by mx1-lw-eu.apache.org (ASF Mail Server at mx1-lw-eu.apache.org) with ESMTPS id B76995F256 for ; Sun, 27 Nov 2016 05:32:49 +0000 (UTC) Received: from localhost (localhost [127.0.0.1]) by ip-10-146-233-104.ec2.internal (8.14.4/8.14.4) with ESMTP id uAR5Wiw2022397; Sun, 27 Nov 2016 05:32:44 GMT Message-Id: <201611270532.uAR5Wiw2022397@ip-10-146-233-104.ec2.internal> Date: Sun, 27 Nov 2016 05:32:44 +0000 From: "Mostafa Mokhtar (Code Review)" To: Bharath Vissapragada , impala-cr@cloudera.com, reviews@impala.incubator.apache.org CC: Alex Behm Reply-To: mmokhtar@cloudera.com X-Gerrit-MessageType: comment Subject: =?UTF-8?Q?=5BImpala-ASF-CR=5D_IMPALA-4172/IMPALA-3653=3A_Improvements_to_block_metadata_loading=0A?= X-Gerrit-Change-Id: Ie127658172e6e70dae441374530674a4ac9d5d26 X-Gerrit-ChangeURL: X-Gerrit-Commit: 265c75e80a3162b2531c57491ee184462dadfd3f In-Reply-To: References: MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Content-Disposition: inline User-Agent: Gerrit/2.12.2 archived-at: Sun, 27 Nov 2016 05:32:55 -0000 Mostafa Mokhtar has posted comments on this change. Change subject: IMPALA-4172/IMPALA-3653: Improvements to block metadata loading ...................................................................... Patch Set 4: Just tried out the latest patch and metadata loading is 5.4x faster. With the patch metadata loading for 80 partitions with 250K files finished in 27 seconds compared to 146 seconds without. Most of the CPU time is spent in the RemoteIterator, to further speedup metadata loading I recommend using a thread pool. Stack Trace Sample Count Percentage(%) org.apache.impala.catalog.HdfsTable.load(boolean, IMetaStoreClient, Table) 509 74.307 org.apache.impala.catalog.HdfsTable.load(boolean, IMetaStoreClient, Table, boolean, boolean, Set) 509 74.307 org.apache.impala.catalog.HdfsTable.loadAllPartitions(List, Table) 507 74.015 org.apache.impala.catalog.HdfsTable.loadMetadataAndDiskIds(FileSystem, List, HashMap) 497 72.555 org.apache.impala.catalog.HdfsTable.loadBlockMetadata(FileSystem, Path, HashMap, Map) 472 68.905 org.apache.hadoop.fs.FileSystem$5.hasNext() 365 53.285 org.apache.hadoop.hdfs.DistributedFileSystem$DirListingIterator.hasNext() 339 49.489 org.apache.hadoop.hdfs.DistributedFileSystem$DirListingIterator.hasNextNoFilter() 258 37.664 org.apache.hadoop.hdfs.DFSClient.listPaths(String, byte[], boolean) 258 37.664 com.sun.proxy.$Proxy21.getListing(String, byte[], boolean) 258 37.664 org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(Object, Method, Object[]) 258 37.664 org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(Method, Object[]) 258 37.664 java.lang.reflect.Method.invoke(Object, Object[]) 258 37.664 org.apache.hadoop.hdfs.protocol.HdfsLocatedFileStatus.makeQualifiedLocated(URI, Path) 81 11.825 -- To view, visit http://gerrit.cloudera.org:8080/5148 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ie127658172e6e70dae441374530674a4ac9d5d26 Gerrit-PatchSet: 4 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Bharath Vissapragada Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Mostafa Mokhtar Gerrit-HasComments: No