hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Guanghao Zhang (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-16393) Improve computeHDFSBlocksDistribution
Date Thu, 11 Aug 2016 01:18:20 GMT

    [ https://issues.apache.org/jira/browse/HBASE-16393?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15416327#comment-15416327

Guanghao Zhang commented on HBASE-16393:

+1 on this idea. We found this in our production cluster, too. The balancer is too slow when
there are a lot of regions. And some default balancer configs is too small for big cluster.
Maybe we can make the default config value related to regions number.

> Improve computeHDFSBlocksDistribution
> -------------------------------------
>                 Key: HBASE-16393
>                 URL: https://issues.apache.org/jira/browse/HBASE-16393
>             Project: HBase
>          Issue Type: Improvement
>            Reporter: binlijin
>         Attachments: HBASE-16393.patch
> With our cluster is big, i can see the balancer is slow from time to time. And the balancer
will be called on master startup, so we can see the startup is slow also. 
> The first thing i think whether if we can parallel compute different region's HDFSBlocksDistribution.

> The second i think we can improve compute single region's HDFSBlocksDistribution.
> When to compute a storefile's HDFSBlocksDistribution first we call FileSystem#getFileStatus(path)
and then FileSystem#getFileBlockLocations(status, start, length), so two namenode rpc call
for every storefile. Instead we can use FileSystem#listLocatedStatus to get a LocatedFileStatus
for the information we need, so reduce the namenode rpc call to one. This can speed the computeHDFSBlocksDistribution,
but also send out less rpc call to namenode.

This message was sent by Atlassian JIRA

View raw message