hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Yu Li (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-16393) Improve computeHDFSBlocksDistribution
Date Thu, 11 Aug 2016 09:28:20 GMT

    [ https://issues.apache.org/jira/browse/HBASE-16393?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15416942#comment-15416942
] 

Yu Li commented on HBASE-16393:
-------------------------------

{quote}
The first thing i think whether if we can parallel compute different region's HDFSBlocksDistribution.

The second i think we can improve compute single region's HDFSBlocksDistribution.

I know there is other place can improve by the same way, first improve StoreFileInfo#computeHDFSBlocksDistribution.
I like to improve other places by subtask.
{quote}
Please open the subtasks and attach patches there for easier review fella, I guess there should
be at least three? [~aoxiang]

> Improve computeHDFSBlocksDistribution
> -------------------------------------
>
>                 Key: HBASE-16393
>                 URL: https://issues.apache.org/jira/browse/HBASE-16393
>             Project: HBase
>          Issue Type: Improvement
>            Reporter: binlijin
>         Attachments: HBASE-16393.patch
>
>
> With our cluster is big, i can see the balancer is slow from time to time. And the balancer
will be called on master startup, so we can see the startup is slow also. 
> The first thing i think whether if we can parallel compute different region's HDFSBlocksDistribution.

> The second i think we can improve compute single region's HDFSBlocksDistribution.
> When to compute a storefile's HDFSBlocksDistribution first we call FileSystem#getFileStatus(path)
and then FileSystem#getFileBlockLocations(status, start, length), so two namenode rpc call
for every storefile. Instead we can use FileSystem#listLocatedStatus to get a LocatedFileStatus
for the information we need, so reduce the namenode rpc call to one. This can speed the computeHDFSBlocksDistribution,
but also send out less rpc call to namenode.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message