Return-Path: X-Original-To: apmail-hbase-issues-archive@www.apache.org Delivered-To: apmail-hbase-issues-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 4B25F1851B for ; Fri, 9 Oct 2015 19:17:06 +0000 (UTC) Received: (qmail 1011 invoked by uid 500); 9 Oct 2015 19:17:06 -0000 Delivered-To: apmail-hbase-issues-archive@hbase.apache.org Received: (qmail 948 invoked by uid 500); 9 Oct 2015 19:17:06 -0000 Mailing-List: contact issues-help@hbase.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Delivered-To: mailing list issues@hbase.apache.org Received: (qmail 918 invoked by uid 99); 9 Oct 2015 19:17:05 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 09 Oct 2015 19:17:05 +0000 Date: Fri, 9 Oct 2015 19:17:05 +0000 (UTC) From: "Stephen Yuan Jiang (JIRA)" To: issues@hbase.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (HBASE-14536) Balancer & SSH interfering with each other leading to unavailability MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/HBASE-14536?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14951037#comment-14951037 ] Stephen Yuan Jiang commented on HBASE-14536: -------------------------------------------- [~stack] and [~enis], how do you think about this proposal? any holes you can think of? > Balancer & SSH interfering with each other leading to unavailability > -------------------------------------------------------------------- > > Key: HBASE-14536 > URL: https://issues.apache.org/jira/browse/HBASE-14536 > Project: HBase > Issue Type: Bug > Components: master, Region Assignment > Affects Versions: 1.1.2 > Reporter: Devaraj Das > Assignee: Stephen Yuan Jiang > Fix For: 1.1.4 > > Attachments: HBASE-14536.draft-branch-1.patch, master-log.tgz > > > Came across this in our cluster: > 1. The meta was assigned to a server 10.0.0.149,16020,1443507203340 > {noformat} > 2015-09-29 06:16:22,472 DEBUG [AM.ZK.Worker-pool2-t56] > master.RegionStates: Onlined 1588230740 on > 10.0.0.149,16020,1443507203340 {ENCODED => 1588230740, NAME => > 'hbase:meta,,1', STARTKEY => '', ENDKEY => ''} > {noformat} > 2. The server dies at some point: > {noformat} > 2015-09-29 06:18:25,952 INFO [main-EventThread] > zookeeper.RegionServerTracker: RegionServer ephemeral node deleted, > processing expiration [10.0.0.149,16020,1443507203340] > 2015-09-29 06:18:25,955 DEBUG [main-EventThread] master.AssignmentManager: based on AM, current > region=hbase:meta,,1.1588230740 is on server=10.0.0.149,16020,1443507203340 server being checked: > 10.0.0.149,16020,1443507203340 > {noformat} > 3. The balancer had computed a plan that contained a move for the meta: > {noformat} > 2015-09-29 06:18:26,833 INFO [B.defaultRpcServer.handler=12,queue=0,port=16000] master.HMaster: > balance hri=hbase:meta,,1.1588230740, > src=10.0.0.149,16020,1443507203340, dest=10.0.0.205,16020,1443507257905 > {noformat} > 4. The following ensues after this, leading to the meta remaining unassigned: > {noformat} > 2015-09-29 06:18:26,859 DEBUG [B.defaultRpcServer.handler=12,queue=0,port=16000] > master.AssignmentManager: Offline hbase:meta,,1.1588230740, no need to > unassign since it's on a dead server: 10.0.0.149,16020,1443507203340 > ...................... > 2015-09-29 06:18:26,899 INFO [B.defaultRpcServer.handler=12,queue=0,port=16000] master.RegionStates: > Offlined 1588230740 from 10.0.0.149,16020,1443507203340 > ..................... > 2015-09-29 06:18:26,914 INFO [B.defaultRpcServer.handler=12,queue=0,port=16000] > master.AssignmentManager: Skip assigning hbase:meta,,1.1588230740, it is > on a dead but not processed yet server: 10.0.0.149,16020,1443507203340 > .................... > 2015-09-29 06:18:26,915 DEBUG [AM.ZK.Worker-pool2-t58] master.AssignmentManager: Znode hbase:meta,,1.1588230740 deleted, > state: {1588230740 state=OFFLINE, ts=1443507506914, > server=10.0.0.149,16020,1443507203340} > .................... > 2015-09-29 06:18:29,447 DEBUG [MASTER_META_SERVER_OPERATIONS-10.0.0.148:16000-2] master.AssignmentManager: based on AM, current > region=hbase:meta,,1.1588230740 is on server=null server being checked: > 10.0.0.149,16020,1443507203340 > 2015-09-29 06:18:29,451 INFO [MASTER_META_SERVER_OPERATIONS- > 10.0.0.148:16000-2] handler.MetaServerShutdownHandler: META has been > assigned to otherwhere, skip assigning. > 2015-09-29 06:18:29,452 DEBUG [MASTER_META_SERVER_OPERATIONS-10.0.0.148:16000-2] > master.DeadServer: Finished processing 10.0.0.149,16020,1443507203340 > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)