Mailing-List: contact issues-help@hbase.apache.org; run by ezmlm
Precedence: bulk
Date: Mon, 28 Nov 2016 05:19:58 +0000 (UTC)
From: "Guanghao Zhang (JIRA)" <jira@apache.org>
To: issues@hbase.apache.org
Message-ID: <JIRA.13023622.1480303309000.374785.1480310398614@Atlassian.JIRA>
In-Reply-To: <JIRA.13023622.1480303309000@Atlassian.JIRA>
References: <JIRA.13023622.1480303309000@Atlassian.JIRA> <JIRA.13023622.1480303309884@arcas>
Subject: [jira] [Commented] (HBASE-17178) Add region balance throttling
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 7bit
archived-at: Mon, 28 Nov 2016 05:20:01 -0000


    [ https://issues.apache.org/jira/browse/HBASE-17178?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15700949#comment-15700949 ] 

Guanghao Zhang commented on HBASE-17178:
----------------------------------------

We have been used hbase.balancer.max.balancing.regions for a long time and set it to 1 for our online cluster. But recently we found that it was too small for some use case. So we plan to add a more automatically throttling strategy.
bq. so I guess we could just reuse it?
Yeah, plan to use it.
bq. Regarding this "average time of RIT", is it recorded and computed automatically?
I thought it can be recorded and computed by AssignmentManager. I will try to upload a patch today. Thanks.

> Add region balance throttling
> -----------------------------
>
>                 Key: HBASE-17178
>                 URL: https://issues.apache.org/jira/browse/HBASE-17178
>             Project: HBase
>          Issue Type: Improvement
>          Components: Balancer
>            Reporter: Guanghao Zhang
>            Assignee: Guanghao Zhang
>
> Our online cluster serves dozens of  tables and different tables serve for different services. If the balancer moves too many regions in the same time, 
> it will decrease the availability for some table or some services. So we add region balance throttling on our online serve cluster. 
> We introduce a new config hbase.balancer.max.balancing.regions, which means the max number of regions in transition when balancing.
> If we config this to 1 and a table have 100 regions, then the table will have 99 regions available at any time. It helps a lot for our use case and it has been running a long time
> our production cluster.
> But for some use case, we need the balancer run faster. If a cluster has 100 regionservers, then it add 50 new regionservers for peak requests. Then it need balancer run as soon as
> possible and let the cluster reach a balance state soon. Our idea is compute max number of regions in transition by the max balancing time and the average time of region in transition.
> Then the balancer use the computed value to throttling.
> Examples for understanding.
> A cluster has 100 regionservers, each regionserver has 200 regions and the average time of region in transition is 1 seconds, we config the max balancing time is 10 * 60 seconds.
> Case 1. One regionserver crash, the cluster at most need balance 200 regions. Then 200 / (10 * 60s / 1s) < 1, it means the max number of regions in transition is 1 when balancing. Then the balancer can move region one by one and the cluster will have high availability  when balancing.
> Case 2. Add other 100 regionservers, the cluster at most need balance 10000 regions. Then 10000 / (10 * 60s / 1s) = 16.7, it means the max number of regions in transition is 17 when balancing. Then the cluster can reach a balance state within the max balancing time.
> Any suggestions are welcomed.


--
This message was sent by Atlassian JIRA
(v6.3.4#6332)