Return-Path: X-Original-To: apmail-hbase-issues-archive@www.apache.org Delivered-To: apmail-hbase-issues-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 7EB1C103F1 for ; Fri, 6 Dec 2013 22:31:40 +0000 (UTC) Received: (qmail 19031 invoked by uid 500); 6 Dec 2013 22:31:39 -0000 Delivered-To: apmail-hbase-issues-archive@hbase.apache.org Received: (qmail 18991 invoked by uid 500); 6 Dec 2013 22:31:39 -0000 Mailing-List: contact issues-help@hbase.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Delivered-To: mailing list issues@hbase.apache.org Received: (qmail 18937 invoked by uid 99); 6 Dec 2013 22:31:39 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 06 Dec 2013 22:31:39 +0000 Date: Fri, 6 Dec 2013 22:31:39 +0000 (UTC) From: "Jeffrey Zhong (JIRA)" To: issues@hbase.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (HBASE-10085) Some regions aren't re-assigned after a master restarts MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/HBASE-10085?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13841805#comment-13841805 ] Jeffrey Zhong commented on HBASE-10085: --------------------------------------- I checked in at the same time as your comments. The committed patch has updated format which is from our Apache template auto formatting. The reason to restart whole cluster is that I need to trigger SSH on both old RSs(source RS and dst RS in a region assignment) to repro the exact issue to verify the fix. > Some regions aren't re-assigned after a master restarts > ------------------------------------------------------- > > Key: HBASE-10085 > URL: https://issues.apache.org/jira/browse/HBASE-10085 > Project: HBase > Issue Type: Bug > Components: Region Assignment > Affects Versions: 0.96.1 > Reporter: Jeffrey Zhong > Assignee: Jeffrey Zhong > Fix For: 0.98.0, 0.96.1 > > Attachments: hbase-10085.patch > > > We see this issue happened in a cluster restart: > 1) when shutdown a cluster, some regions are in offline state because no Region servers are available(stop RS and then Master) > 2) When the cluster restarts, the offlined regions are forced to be offline again and SSH skip re-assigning them by function AM.processServerShutdown as shown below. > {code} > 2013-12-03 10:41:56,686 INFO [master:h2-ubuntu12-sec-1386048659-hbase-8:60000] master.AssignmentManager: Processing 873dbd8c269f44d0aefb0f66c5b53537 in state: M_ZK_REGION_OFFLINE > 2013-12-03 10:41:56,686 DEBUG [master:h2-ubuntu12-sec-1386048659-hbase-8:60000] master.AssignmentManager: RIT 873dbd8c269f44d0aefb0f66c5b53537 in state=M_ZK_REGION_OFFLINE was on deadserver; forcing offline > ... > 2013-12-03 10:41:56,739 DEBUG [AM.-pool1-t8] master.AssignmentManager: Force region state offline {873dbd8c269f44d0aefb0f66c5b53537 state=OFFLINE, ts=1386067316737, server=h2-ubuntu12-sec-1386048659-hbase-6.cs1cloud.internal,60020,1386066968696} > ... > 2013-12-03 10:41:57,223 WARN [MASTER_SERVER_OPERATIONS-h2-ubuntu12-sec-1386048659-hbase-8:60000-3] master.RegionStates: THIS SHOULD NOT HAPPEN: unexpected {873dbd8c269f44d0aefb0f66c5b53537 state=OFFLINE, ts=1386067316737, server=h2-ubuntu12-sec-1386048659-hbase-6.cs1cloud.internal,60020,1386066968696} > {code} -- This message was sent by Atlassian JIRA (v6.1#6144)