Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id 5EAC7200B84 for ; Mon, 29 Aug 2016 02:40:22 +0200 (CEST) Received: by cust-asf.ponee.io (Postfix) id 5D2A2160AC7; Mon, 29 Aug 2016 00:40:22 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id A3C1F160AB4 for ; Mon, 29 Aug 2016 02:40:21 +0200 (CEST) Received: (qmail 3817 invoked by uid 500); 29 Aug 2016 00:40:20 -0000 Mailing-List: contact issues-help@hbase.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Delivered-To: mailing list issues@hbase.apache.org Received: (qmail 3802 invoked by uid 99); 29 Aug 2016 00:40:20 -0000 Received: from arcas.apache.org (HELO arcas) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 29 Aug 2016 00:40:20 +0000 Received: from arcas.apache.org (localhost [127.0.0.1]) by arcas (Postfix) with ESMTP id 9694B2C0159 for ; Mon, 29 Aug 2016 00:40:20 +0000 (UTC) Date: Mon, 29 Aug 2016 00:40:20 +0000 (UTC) From: "binlijin (JIRA)" To: issues@hbase.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (HBASE-16393) Improve computeHDFSBlocksDistribution MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 archived-at: Mon, 29 Aug 2016 00:40:22 -0000 [ https://issues.apache.org/jira/browse/HBASE-16393?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15444404#comment-15444404 ] binlijin commented on HBASE-16393: ---------------------------------- Actually for the use case in hbase the first rpc call is not needed, it is needed only by symlinks, but there is no way to bypass it using DistributedFileSystem. If we want to call just on rpc, we need to direct call DFSClient, but it is not public. > Improve computeHDFSBlocksDistribution > ------------------------------------- > > Key: HBASE-16393 > URL: https://issues.apache.org/jira/browse/HBASE-16393 > Project: HBase > Issue Type: Improvement > Reporter: binlijin > Assignee: binlijin > Attachments: HBASE-16393.patch > > > With our cluster is big, i can see the balancer is slow from time to time. And the balancer will be called on master startup, so we can see the startup is slow also. > The first thing i think whether if we can parallel compute different region's HDFSBlocksDistribution. > The second i think we can improve compute single region's HDFSBlocksDistribution. > When to compute a storefile's HDFSBlocksDistribution first we call FileSystem#getFileStatus(path) and then FileSystem#getFileBlockLocations(status, start, length), so two namenode rpc call for every storefile. Instead we can use FileSystem#listLocatedStatus to get a LocatedFileStatus for the information we need, so reduce the namenode rpc call to one. This can speed the computeHDFSBlocksDistribution, but also send out less rpc call to namenode. -- This message was sent by Atlassian JIRA (v6.3.4#6332)