hbase-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Charlie Qiangeng Xu (JIRA)" <j...@apache.org>
Subject [jira] [Created] (HBASE-17039) SimpleLoadBalancer schedules large amount of invalid region moves
Date Mon, 07 Nov 2016 04:11:58 GMT
Charlie Qiangeng Xu created HBASE-17039:

             Summary: SimpleLoadBalancer schedules large amount of invalid region moves
                 Key: HBASE-17039
                 URL: https://issues.apache.org/jira/browse/HBASE-17039
             Project: HBase
          Issue Type: Bug
          Components: Balancer
    Affects Versions: 1.2.3, 1.1.6, 2.0.0
            Reporter: Charlie Qiangeng Xu
            Assignee: Charlie Qiangeng Xu
             Fix For: 2.0.0, 1.2.3, 1.1.6

After increasing one of our clusters to 1600 nodes, we observed a large amount of invalid
region moves(more than 30000 thousand moves) fired by balance chore. Thus we simulated the
problem and printed out the balance plan, only to find out many server that had two regions
for a certain table(we use by table strategy), sent out both regions to other two servers
that have zero regions. 
In the SimpleLoadBalancer's balanceCluster function,
the code block that determines the underLoadedServers might have a problem:
      if (load >= min && load > 0) {
        continue; // look for other servers which haven't reached min
      int regionsToPut = min - load;
      if (regionsToPut == 0)
        regionsToPut = 1;
if min is zero, some server that has load of zero, which equals to min would be marked as
underloaded, which would cause such problem mentioned above.
Since we increase the cluster's size to 1600+, many table only have 1000 regions, now would
encounter such issue.

This message was sent by Atlassian JIRA

View raw message