Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id 17E9D200BD1 for ; Mon, 28 Nov 2016 12:26:00 +0100 (CET) Received: by cust-asf.ponee.io (Postfix) id 16A1C160B25; Mon, 28 Nov 2016 11:26:00 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id 61B65160B06 for ; Mon, 28 Nov 2016 12:25:59 +0100 (CET) Received: (qmail 13464 invoked by uid 500); 28 Nov 2016 11:25:58 -0000 Mailing-List: contact issues-help@hbase.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Delivered-To: mailing list issues@hbase.apache.org Received: (qmail 13440 invoked by uid 99); 28 Nov 2016 11:25:58 -0000 Received: from arcas.apache.org (HELO arcas) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 28 Nov 2016 11:25:58 +0000 Received: from arcas.apache.org (localhost [127.0.0.1]) by arcas (Postfix) with ESMTP id 6A8672C03DF for ; Mon, 28 Nov 2016 11:25:58 +0000 (UTC) Date: Mon, 28 Nov 2016 11:25:58 +0000 (UTC) From: "Guanghao Zhang (JIRA)" To: issues@hbase.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Updated] (HBASE-17178) Add region balance throttling MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 archived-at: Mon, 28 Nov 2016 11:26:00 -0000 [ https://issues.apache.org/jira/browse/HBASE-17178?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Guanghao Zhang updated HBASE-17178: ----------------------------------- Attachment: HBASE-17178-v1.patch Attach a v1 patch. Only add region in transition time in metrics of AssignmentManager. And add a new config for region in transition time. > Add region balance throttling > ----------------------------- > > Key: HBASE-17178 > URL: https://issues.apache.org/jira/browse/HBASE-17178 > Project: HBase > Issue Type: Improvement > Components: Balancer > Reporter: Guanghao Zhang > Assignee: Guanghao Zhang > Attachments: HBASE-17178-v1.patch > > > Our online cluster serves dozens of tables and different tables serve for different services. If the balancer moves too many regions in the same time, > it will decrease the availability for some table or some services. So we add region balance throttling on our online serve cluster. > We introduce a new config hbase.balancer.max.balancing.regions, which means the max number of regions in transition when balancing. > If we config this to 1 and a table have 100 regions, then the table will have 99 regions available at any time. It helps a lot for our use case and it has been running a long time > our production cluster. > But for some use case, we need the balancer run faster. If a cluster has 100 regionservers, then it add 50 new regionservers for peak requests. Then it need balancer run as soon as > possible and let the cluster reach a balance state soon. Our idea is compute max number of regions in transition by the max balancing time and the average time of region in transition. > Then the balancer use the computed value to throttling. > Examples for understanding. > A cluster has 100 regionservers, each regionserver has 200 regions and the average time of region in transition is 1 seconds, we config the max balancing time is 10 * 60 seconds. > Case 1. One regionserver crash, the cluster at most need balance 200 regions. Then 200 / (10 * 60s / 1s) < 1, it means the max number of regions in transition is 1 when balancing. Then the balancer can move region one by one and the cluster will have high availability when balancing. > Case 2. Add other 100 regionservers, the cluster at most need balance 10000 regions. Then 10000 / (10 * 60s / 1s) = 16.7, it means the max number of regions in transition is 17 when balancing. Then the cluster can reach a balance state within the max balancing time. > Any suggestions are welcomed. -- This message was sent by Atlassian JIRA (v6.3.4#6332)