Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id 01BB8200BDB for ; Mon, 28 Nov 2016 06:20:01 +0100 (CET) Received: by cust-asf.ponee.io (Postfix) id 0067C160B26; Mon, 28 Nov 2016 05:20:01 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id 4AB83160B12 for ; Mon, 28 Nov 2016 06:20:00 +0100 (CET) Received: (qmail 45832 invoked by uid 500); 28 Nov 2016 05:19:58 -0000 Mailing-List: contact issues-help@hbase.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Delivered-To: mailing list issues@hbase.apache.org Received: (qmail 45770 invoked by uid 99); 28 Nov 2016 05:19:58 -0000 Received: from arcas.apache.org (HELO arcas) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 28 Nov 2016 05:19:58 +0000 Received: from arcas.apache.org (localhost [127.0.0.1]) by arcas (Postfix) with ESMTP id 96A482C03DF for ; Mon, 28 Nov 2016 05:19:58 +0000 (UTC) Date: Mon, 28 Nov 2016 05:19:58 +0000 (UTC) From: "Guanghao Zhang (JIRA)" To: issues@hbase.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (HBASE-17178) Add region balance throttling MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 archived-at: Mon, 28 Nov 2016 05:20:01 -0000 [ https://issues.apache.org/jira/browse/HBASE-17178?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15700949#comment-15700949 ] Guanghao Zhang commented on HBASE-17178: ---------------------------------------- We have been used hbase.balancer.max.balancing.regions for a long time and set it to 1 for our online cluster. But recently we found that it was too small for some use case. So we plan to add a more automatically throttling strategy. bq. so I guess we could just reuse it? Yeah, plan to use it. bq. Regarding this "average time of RIT", is it recorded and computed automatically? I thought it can be recorded and computed by AssignmentManager. I will try to upload a patch today. Thanks. > Add region balance throttling > ----------------------------- > > Key: HBASE-17178 > URL: https://issues.apache.org/jira/browse/HBASE-17178 > Project: HBase > Issue Type: Improvement > Components: Balancer > Reporter: Guanghao Zhang > Assignee: Guanghao Zhang > > Our online cluster serves dozens of tables and different tables serve for different services. If the balancer moves too many regions in the same time, > it will decrease the availability for some table or some services. So we add region balance throttling on our online serve cluster. > We introduce a new config hbase.balancer.max.balancing.regions, which means the max number of regions in transition when balancing. > If we config this to 1 and a table have 100 regions, then the table will have 99 regions available at any time. It helps a lot for our use case and it has been running a long time > our production cluster. > But for some use case, we need the balancer run faster. If a cluster has 100 regionservers, then it add 50 new regionservers for peak requests. Then it need balancer run as soon as > possible and let the cluster reach a balance state soon. Our idea is compute max number of regions in transition by the max balancing time and the average time of region in transition. > Then the balancer use the computed value to throttling. > Examples for understanding. > A cluster has 100 regionservers, each regionserver has 200 regions and the average time of region in transition is 1 seconds, we config the max balancing time is 10 * 60 seconds. > Case 1. One regionserver crash, the cluster at most need balance 200 regions. Then 200 / (10 * 60s / 1s) < 1, it means the max number of regions in transition is 1 when balancing. Then the balancer can move region one by one and the cluster will have high availability when balancing. > Case 2. Add other 100 regionservers, the cluster at most need balance 10000 regions. Then 10000 / (10 * 60s / 1s) = 16.7, it means the max number of regions in transition is 17 when balancing. Then the cluster can reach a balance state within the max balancing time. > Any suggestions are welcomed. -- This message was sent by Atlassian JIRA (v6.3.4#6332)