hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Charlie Qiangeng Xu (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-17110) Improve SimpleLoadBalancer to always take server-level balance into account
Date Tue, 02 May 2017 09:31:04 GMT

    [ https://issues.apache.org/jira/browse/HBASE-17110?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15992622#comment-15992622

Charlie Qiangeng Xu commented on HBASE-17110:

Having several sticky works on hand now, yet I will definitely squeeze time for this :)

> Improve SimpleLoadBalancer to always take server-level balance into account
> ---------------------------------------------------------------------------
>                 Key: HBASE-17110
>                 URL: https://issues.apache.org/jira/browse/HBASE-17110
>             Project: HBase
>          Issue Type: Improvement
>          Components: Balancer
>    Affects Versions: 2.0.0, 1.2.4
>            Reporter: Charlie Qiangeng Xu
>            Assignee: Charlie Qiangeng Xu
>             Fix For: 2.0.0
>         Attachments: HBASE-17110.patch, HBASE-17110-V2.patch, HBASE-17110-V3.patch, HBASE-17110-V4.patch,
HBASE-17110-V5.patch, HBASE-17110-V6.patch, HBASE-17110-V7.patch, HBASE-17110-V8.patch
> Currently with bytable strategy there might still be server-level imbalance and we will
improve this in this JIRA.
> Some more background:
> When operating large scale clusters(our case), some companies still prefer to use {{SimpleLoadBalancer}}
due to its simplicity, quick balance plan generation, etc. Current SimpleLoadBalancer has
two modes: 
> 1. byTable, which only guarantees that the regions of one table could be uniformly distributed.

> 2. byCluster, which ignores the distribution within tables and balance the regions all
> If the pressures on different tables are different, the first byTable option is the preferable
one in most case. Yet, this choice sacrifice the cluster level balance and would cause some
servers to have significantly higher load, e.g. 242 regions on server A but 417 regions on
server B.(real world stats)
> Consider this case,  a cluster has 3 tables and 4 servers:
> {noformat}
>   server A has 3 regions: table1:1, table2:1, table3:1
>   server B has 3 regions: table1:2, table2:2, table3:2
>   server C has 3 regions: table1:3, table2:3, table3:3
>   server D has 0 regions.
> {noformat}
> From the byTable strategy's perspective, the cluster has already been perfectly balanced
on table level. But a perfect status should be like:
> {noformat}
>   server A has 2 regions: table2:1, table3:1
>   server B has 2 regions: table1:2, table3:2
>   server C has 3 regions: table1:3, table2:3, table3:3
>   server D has 2 regions: table1:1, table2:2
> {noformat}
> We can see the server loads change from 3,3,3,0 to 2,2,3,2, while the table1, table2
and table3 still keep balanced. And this is the goal this JIRA tries to achieve.
> Two UTs will be added as well with the last one demonstrating advantage of the new strategy.
Also, a onConfigurationChange method will be implemented to hot control the "slop" variable.
> We have been using the strategy on our largest cluster for several months, so the effect
could be assured to some extent.

This message was sent by Atlassian JIRA

View raw message