Mailing-List: contact issues-help@hbase.apache.org; run by ezmlm
Precedence: bulk
Date: Mon, 29 Aug 2016 00:40:20 +0000 (UTC)
From: "binlijin (JIRA)" <jira@apache.org>
To: issues@hbase.apache.org
Message-ID: <JIRA.12996476.1470876156000.431961.1472431220613@Atlassian.JIRA>
In-Reply-To: <JIRA.12996476.1470876156000@Atlassian.JIRA>
References: <JIRA.12996476.1470876156000@Atlassian.JIRA> <JIRA.12996476.1470876156186@arcas>
Subject: [jira] [Commented] (HBASE-16393) Improve
 computeHDFSBlocksDistribution
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 7bit
archived-at: Mon, 29 Aug 2016 00:40:22 -0000


    [ https://issues.apache.org/jira/browse/HBASE-16393?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15444404#comment-15444404 ] 

binlijin commented on HBASE-16393:
----------------------------------

Actually for the use case in hbase the first rpc call is not needed, it is needed only by symlinks, but there is no way to bypass it using DistributedFileSystem. If we want to call just on rpc, we need to direct call DFSClient, but it is not public.


> Improve computeHDFSBlocksDistribution
> -------------------------------------
>
>                 Key: HBASE-16393
>                 URL: https://issues.apache.org/jira/browse/HBASE-16393
>             Project: HBase
>          Issue Type: Improvement
>            Reporter: binlijin
>            Assignee: binlijin
>         Attachments: HBASE-16393.patch
>
>
> With our cluster is big, i can see the balancer is slow from time to time. And the balancer will be called on master startup, so we can see the startup is slow also. 
> The first thing i think whether if we can parallel compute different region's HDFSBlocksDistribution. 
> The second i think we can improve compute single region's HDFSBlocksDistribution.
> When to compute a storefile's HDFSBlocksDistribution first we call FileSystem#getFileStatus(path) and then FileSystem#getFileBlockLocations(status, start, length), so two namenode rpc call for every storefile. Instead we can use FileSystem#listLocatedStatus to get a LocatedFileStatus for the information we need, so reduce the namenode rpc call to one. This can speed the computeHDFSBlocksDistribution, but also send out less rpc call to namenode.


--
This message was sent by Atlassian JIRA
(v6.3.4#6332)