hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Elliott Clark (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (HBASE-8517) Stochastic Loadbalancer isn't finding steady state on real clusters
Date Fri, 10 May 2013 21:49:15 GMT

     [ https://issues.apache.org/jira/browse/HBASE-8517?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Elliott Clark updated HBASE-8517:
---------------------------------

    Attachment: HBASE-8517-0.patch

Here's a patch that solves the issue I was seeing on a cluster.

Most of the issue was upping the move cost so that all cost functions below it couldn't overcome
the cost of a move.  In other words the balancer will find the best moves when the region
load is our of wack, but it won't move just to make locality better.

Then I also took a crack at making this thing work on larger clusters.  

To do that I made it closer to simulated annealing.  It picks either a random server, or the
most loaded server.  This means that more often we have a chance to work on the worst balanced
server.

Then I also made the algorithm prefer to move regions off of the over loaded servers.

Then I made the max number of moves scale with cluster size. 
                
> Stochastic Loadbalancer isn't finding steady state on real clusters
> -------------------------------------------------------------------
>
>                 Key: HBASE-8517
>                 URL: https://issues.apache.org/jira/browse/HBASE-8517
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Elliott Clark
>            Assignee: Elliott Clark
>         Attachments: HBASE-8517-0.patch
>
>
> I have a cluster that runs IT tests.  Last night after all tests were done I noticed
that the balancer was thrashing regions around.
> The number of regions on each machine is not correct.
> The balancer seems to value the cost of moving a region way too little.
> {code}
> 2013-05-09 16:34:58,920 DEBUG [IPC Server handler 4 on 60000] org.apache.hadoop.hbase.master.balancer.StochasticLoadBalancer:
Finished computing new load balance plan.  Computation took 5367ms to try 8910 different iterations.
 Found a solution that moves 37 regions; Going from a computed cost of 56.50254222730425 to
a new cost of 11.214035466575254
> 2013-05-09 16:37:48,715 DEBUG [IPC Server handler 7 on 60000] org.apache.hadoop.hbase.master.balancer.StochasticLoadBalancer:
Finished computing new load balance plan.  Computation took 4735ms to try 8910 different iterations.
 Found a solution that moves 38 regions; Going from a computed cost of 56.612624531830996
to a new cost of 11.275763861636982
> 2013-05-09 16:38:11,398 DEBUG [IPC Server handler 6 on 60000] org.apache.hadoop.hbase.master.balancer.StochasticLoadBalancer:
Finished computing new load balance plan.  Computation took 4502ms to try 8910 different iterations.
 Found a solution that moves 39 regions; Going from a computed cost of 56.50048461413552 to
a new cost of 11.225352339003237
> {code}
> Each of those balancer runs were triggered when there was no load on the cluster.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message