hbase-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Billy Pearson (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HBASE-862) region balancing is clumsy
Date Wed, 03 Sep 2008 05:17:44 GMT

    [ https://issues.apache.org/jira/browse/HBASE-862?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12627907#action_12627907

Billy Pearson commented on HBASE-862:

+1 I see this also. 

I also see MR jobs fail often if I add a region server to the cluster while the job is running.
I thank this is sometimes from closing regions that are running a timely compaction and will
not close for a while to be redeployed.

What about when we send the request to close a region make it different from normal close
call and give the region server a option to decline the request
example say the master sends a request to close a small group of regions to redeploy and the
region server have 1 or more of the regions queued up for compaction
let the region server send a request back to the master declining the regions that are in
the compaction queue or if they have a open scanner on them etc...

also I would slow down the redeploy of the regions to 1-3 in a cycle where we wait until all
the regions are open again before moving more.
We also might build in some give in the numbers per server to make it less likely to move
a region if one of the servers is 1-3 regions or 1-5%  out of balance.
I would like to see the balancer keep everything even but I would be ok with it leavening
it a little out of balance.
Maybe we can use something like the lease timeout var from the config to define how often
the balancer runs a cycle.

My down the road wish list is one day be able report back to the master in the heartbeat the
load on the regions that a region server has and generate a read/write load numbers per region/table/server/cluster/etc..
With this data we could be more sophisticated on what regions to move and when.

> region balancing is clumsy
> --------------------------
>                 Key: HBASE-862
>                 URL: https://issues.apache.org/jira/browse/HBASE-862
>             Project: Hadoop HBase
>          Issue Type: Bug
>            Reporter: stack
> Daniel Leffel has an install of 500 regions on 4 nodes.  He's running 0.2.0.
> On restart, load balancing is running while the 600 regions are being initially opened.
 Makes for churn.  Load balancing should wait before it cuts in.
> Have also seen on occasion that it will not find equilibrium after a restart.
> Adding a node is catastrophic.  >20% of the regions were closed and were taking the
longest time to show up on the new server.  I would think that the region balancing would
work in more sophisticated and gradual manner.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message