hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Guanghao Zhang (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (HBASE-17178) Add region balance throttling
Date Wed, 30 Nov 2016 01:38:58 GMT

    [ https://issues.apache.org/jira/browse/HBASE-17178?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15707188#comment-15707188
] 

Guanghao Zhang edited comment on HBASE-17178 at 11/30/16 1:38 AM:
------------------------------------------------------------------

Review board: https://reviews.apache.org/r/54191/

bq. Move this line out of synchronized
Fixed in v4 patch.

bq. Shall the balancing be affected by other RIT? Assuming RS crash happened in middle of
balancing, shall we wait?
Yes, balancing will be affected by other RIT. This is for availability. If RS crash happend
in middle of balancing, there will be more regions in transition. Then the balancer can't
finish all region plans. The cluster need a next round balance to reach a balance state.

bq.  the code flow of balancer might block here and not controlled by the cutoffTime?
Fixed in v4 patch. It need break the sleep when exceeds cutoff time.





was (Author: zghaobac):
Review board: https://reviews.apache.org/r/54191/

bq. Move this line out of synchronized
Fixed in v4 patch.

bq. Shall the balancing be affected by other RIT? Assuming RS crash happened in middle of
balancing, shall we wait?
Yes, balancing will be affected by other RIT. This is for availability. If RS crash happend
in middle of balancing, there will be more regions in transition. Then the balancer can't
finish all region plans. The cluster need a next round balance to reach a balance state.

bq.  the code flow of balancer might block here and not controlled by the cutoffTime?
Fixed in v4 patch. It need break the sleep when exceeds cutoff time.

Review board: https://reviews.apache.org/r/54191/



> Add region balance throttling
> -----------------------------
>
>                 Key: HBASE-17178
>                 URL: https://issues.apache.org/jira/browse/HBASE-17178
>             Project: HBase
>          Issue Type: Improvement
>          Components: Balancer
>            Reporter: Guanghao Zhang
>            Assignee: Guanghao Zhang
>         Attachments: HBASE-17178-v1.patch, HBASE-17178-v2.patch, HBASE-17178-v3.patch,
HBASE-17178-v4.patch
>
>
> Our online cluster serves dozens of  tables and different tables serve for different
services. If the balancer moves too many regions in the same time, 
> it will decrease the availability for some table or some services. So we add region balance
throttling on our online serve cluster. 
> We introduce a new config hbase.balancer.max.balancing.regions, which means the max number
of regions in transition when balancing.
> If we config this to 1 and a table have 100 regions, then the table will have 99 regions
available at any time. It helps a lot for our use case and it has been running a long time
> our production cluster.
> But for some use case, we need the balancer run faster. If a cluster has 100 regionservers,
then it add 50 new regionservers for peak requests. Then it need balancer run as soon as
> possible and let the cluster reach a balance state soon. Our idea is compute max number
of regions in transition by the max balancing time and the average time of region in transition.
> Then the balancer use the computed value to throttling.
> Examples for understanding.
> A cluster has 100 regionservers, each regionserver has 200 regions and the average time
of region in transition is 1 seconds, we config the max balancing time is 10 * 60 seconds.
> Case 1. One regionserver crash, the cluster at most need balance 200 regions. Then 200
/ (10 * 60s / 1s) < 1, it means the max number of regions in transition is 1 when balancing.
Then the balancer can move region one by one and the cluster will have high availability 
when balancing.
> Case 2. Add other 100 regionservers, the cluster at most need balance 10000 regions.
Then 10000 / (10 * 60s / 1s) = 16.7, it means the max number of regions in transition is 17
when balancing. Then the cluster can reach a balance state within the max balancing time.
> Any suggestions are welcomed.
> Review board: https://reviews.apache.org/r/54191/



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message