Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id B5DCC200D34 for ; Fri, 20 Oct 2017 00:10:06 +0200 (CEST) Received: by cust-asf.ponee.io (Postfix) id B45F0160BEC; Thu, 19 Oct 2017 22:10:06 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id 0C3D91609EE for ; Fri, 20 Oct 2017 00:10:05 +0200 (CEST) Received: (qmail 42128 invoked by uid 500); 19 Oct 2017 22:10:05 -0000 Mailing-List: contact issues-help@hbase.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Delivered-To: mailing list issues@hbase.apache.org Received: (qmail 42117 invoked by uid 99); 19 Oct 2017 22:10:05 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd3-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 19 Oct 2017 22:10:05 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd3-us-west.apache.org (ASF Mail Server at spamd3-us-west.apache.org) with ESMTP id 5FB421807F9 for ; Thu, 19 Oct 2017 22:10:04 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd3-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: -98.701 X-Spam-Level: X-Spam-Status: No, score=-98.701 tagged_above=-999 required=6.31 tests=[KAM_ASCII_DIVIDERS=0.8, KAM_NUMSUBJECT=0.5, RP_MATCHES_RCVD=-0.001, SPF_PASS=-0.001, URIBL_BLOCKED=0.001, USER_IN_WHITELIST=-100] autolearn=disabled Received: from mx1-lw-eu.apache.org ([10.40.0.8]) by localhost (spamd3-us-west.apache.org [10.40.0.10]) (amavisd-new, port 10024) with ESMTP id 5SElv8Luoo9w for ; Thu, 19 Oct 2017 22:10:03 +0000 (UTC) Received: from mailrelay1-us-west.apache.org (mailrelay1-us-west.apache.org [209.188.14.139]) by mx1-lw-eu.apache.org (ASF Mail Server at mx1-lw-eu.apache.org) with ESMTP id B2E6E60D33 for ; Thu, 19 Oct 2017 22:10:01 +0000 (UTC) Received: from jira-lw-us.apache.org (unknown [207.244.88.139]) by mailrelay1-us-west.apache.org (ASF Mail Server at mailrelay1-us-west.apache.org) with ESMTP id E319FE0F74 for ; Thu, 19 Oct 2017 22:10:00 +0000 (UTC) Received: from jira-lw-us.apache.org (localhost [127.0.0.1]) by jira-lw-us.apache.org (ASF Mail Server at jira-lw-us.apache.org) with ESMTP id 5669321EE9 for ; Thu, 19 Oct 2017 22:10:00 +0000 (UTC) Date: Thu, 19 Oct 2017 22:10:00 +0000 (UTC) From: "Jerry He (JIRA)" To: issues@hbase.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Comment Edited] (HBASE-19021) Restore a few important missing logics for balancer in 2.0 MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 archived-at: Thu, 19 Oct 2017 22:10:06 -0000 [ https://issues.apache.org/jira/browse/HBASE-19021?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16211831#comment-16211831 ] Jerry He edited comment on HBASE-19021 at 10/19/17 10:09 PM: ------------------------------------------------------------- More explanation. In the branch-1 RegionStates.getAssignmentsByTable() https://github.com/apache/hbase/blob/branch-1/hbase-server/src/main/java/org/apache/hadoop/hbase/master/RegionStates.java#L1115 there is a part to deal with servers w/o assignments and draining mode. This is missing after AMv2. But the draining mode is actually ok after a 'detour' in AMv2. The balancer's balanceCluster() can pick a plan to move regions to the draining servers. The regions will be 'unassigned'. But in the 'assign' phase, when going thru retainAssignment check, the plan is checked against the server list obtained from ServerManager.createDestinationServersList(). This list is a good list without the draining servers. So it is like a detour, but the end result is ok. But I restored the branch-1 behavior, which is to take the draining servers out of consideration from the beginning. The balancer's retainAssignment, randomAssignment and roundRobinAssignment all take a server list as parameter. We seem to be always calling ServerManager.createDestinationServersList() to pass the server list. They are all good. Only the big balanceCluster() call has the issue. was (Author: jinghe): More explanation. In the branch-1 RegionStates.getAssignmentsByTable() https://github.com/apache/hbase/blob/branch-1/hbase-server/src/main/java/org/apache/hadoop/hbase/master/RegionStates.java#L1115 there is a part to deal with servers w/o assignments and draining mode. This is missing after AMv2. But the draining mode is actually ok after a 'detour' in AMv2. The balancer's balanceCluster() can pick a plan to move regions to the draining servers. The regions will be 'unassigned'. But in the 'assign' phase, when going thru retainAssignment check, the plan is checked against the server list obtained from ServerManager.createDestinationServersList(). This list is a good list without the draining servers. So it is like a detour, but the end result is ok. But I restored the branch-1 behavior, which is to take the draining servers out of consideration from the beginning. The balancer's retainAssignment, randomAssignment and roundRobinAssignment all take a server list an parameters. We seem to be always calling ServerManager.createDestinationServersList() to pass the server list. They are all good. Only the big balanceCluster() call has the issue. > Restore a few important missing logics for balancer in 2.0 > ---------------------------------------------------------- > > Key: HBASE-19021 > URL: https://issues.apache.org/jira/browse/HBASE-19021 > Project: HBase > Issue Type: Bug > Reporter: Jerry He > Assignee: Jerry He > Priority: Critical > Attachments: HBASE-19021-master.patch, HBASE-19021-master.patch > > > After looking at the code, and some testing, I see the following things are missing for balancer to work properly after AMv2. > # hbase.master.loadbalance.bytable is not respected. It is always 'bytable'. Previous default is cluster wide, not by table. > # Servers with no assignments is not added for balance consideration. > # Crashed server is not removed from the in-memory server map in RegionStates, which affects balance. > # Draining marker is not respected when balance. > Also try to re-enable {{TestRegionRebalancing}}, which has a {{testRebalanceOnRegionServerNumberChange}} -- This message was sent by Atlassian JIRA (v6.4.14#64029)