hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Mikhail Bautin (Commented) (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-4191) hbase load balancer needs locality awareness
Date Tue, 25 Oct 2011 07:18:32 GMT

    [ https://issues.apache.org/jira/browse/HBASE-4191?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13134803#comment-13134803

Mikhail Bautin commented on HBASE-4191:

@Ted: could you please elaborate on how you express the region assignment problem as a Max
Flow problem? If we define the "cost" of assigning a region to a server based on locality,
and define a constraint of "load balancedness" to be such that each regionserver is assigned
no more than approximately ceil(numRegions / numServers) + C regions for some small value
of C, then I can see how the problem becomes a min-cost max flow (http://en.wikipedia.org/wiki/Minimum_cost_flow_problem).
However, I don't see how we could reduce the assignment problem to the max-flow problem directly
> hbase load balancer needs locality awareness
> --------------------------------------------
>                 Key: HBASE-4191
>                 URL: https://issues.apache.org/jira/browse/HBASE-4191
>             Project: HBase
>          Issue Type: New Feature
>            Reporter: Ted Yu
>            Assignee: Liyin Tang
> Previously, HBASE-4114 implements the metrics for HFile HDFS block locality, which provides
the HFile level locality information.
> But in order to work with load balancer and region assignment, we need the region level
locality information.
> Let's define the region locality information first, which is almost the same as HFile
locality index.
> HRegion locality index (HRegion A, RegionServer B) = 
> (Total number of HDFS blocks that can be retrieved locally by the RegionServer B for
the HRegion A) / ( Total number of the HDFS blocks for the Region A)
> So the HRegion locality index tells us that how much locality we can get if the HMaster
assign the HRegion A to the RegionServer B.
> So there will be 2 steps involved to assign regions based on the locality.
> 1) During the cluster start up time, the master will scan the hdfs to calculate the "HRegion
locality index" for each pair of HRegion and Region Server. It is pretty expensive to scan
the dfs. So we only needs to do this once during the start up time.
> 2) During the cluster run time, each region server will update the "HRegion locality
index" as metrics periodically as HBASE-4114 did. The Region Server can expose them to the
Master through ZK, meta table, or just RPC messages. 
> Based on the "HRegion locality index", the assignment manager in the master would have
a global knowledge about the region locality distribution. Imaging the "HRegion locality index"
as the capacity between the region server set and region set, the assignment manager could
the run the MAXIMUM FLOW solver to reach the global optimization. 
> Also the master should share this global view to secondary master in case the master
fail over happens.
> In addition, the HBASE-4491 (Locality Checker) is the tool, which is based on the same
metrics, to proactively to scan dfs to calculate the global locality information in the cluster.
It will help us to verify data locality information during the run time.

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira


View raw message