Return-Path: X-Original-To: apmail-hbase-dev-archive@www.apache.org Delivered-To: apmail-hbase-dev-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id E405618C8C for ; Sat, 5 Dec 2015 06:46:11 +0000 (UTC) Received: (qmail 21357 invoked by uid 500); 5 Dec 2015 06:46:11 -0000 Delivered-To: apmail-hbase-dev-archive@hbase.apache.org Received: (qmail 21065 invoked by uid 500); 5 Dec 2015 06:46:11 -0000 Mailing-List: contact dev-help@hbase.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@hbase.apache.org Delivered-To: mailing list dev@hbase.apache.org Received: (qmail 21041 invoked by uid 99); 5 Dec 2015 06:46:11 -0000 Received: from arcas.apache.org (HELO arcas) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 05 Dec 2015 06:46:11 +0000 Received: from arcas.apache.org (localhost [127.0.0.1]) by arcas (Postfix) with ESMTP id E52432C1F61 for ; Sat, 5 Dec 2015 06:46:10 +0000 (UTC) Date: Sat, 5 Dec 2015 06:46:10 +0000 (UTC) From: "Shuaifeng Zhou (JIRA)" To: dev@hbase.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Created] (HBASE-14931) Active master switches may cause region close forever MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 Shuaifeng Zhou created HBASE-14931: -------------------------------------- Summary: Active master switches may cause region close forever Key: HBASE-14931 URL: https://issues.apache.org/jira/browse/HBASE-14931 Project: HBase Issue Type: Bug Components: master Affects Versions: 0.98.10 Reporter: Shuaifeng Zhou Priority: Critical Fix For: 0.98.17 60010 webpage shows that a region is online on one RS, but when access data in the region throw notServingRegion. After lookup the source code and logs, found that it's because active master switches during the region openning: 1, master1 open region 'region1', sent open region request to rs and create node in zk 2, master1 stoped 3, master2 became active master 4, master2 obtain all region status, 'region1' status is offline 5, rs opened 'region1' node changed to opened in zk, and sent message to master2 6, master2 received RS_ZK_REGION_OPENED, but the status is not pending open or openning, sent unassign to rs, 'region1' closed {code:title=AssignmentManager.java|borderStyle=solid} case RS_ZK_REGION_OPENED: // Should see OPENED after OPENING but possible after PENDING_OPEN. if (regionState == null || !regionState.isPendingOpenOrOpeningOnServer(sn)) { LOG.warn("Received OPENED for " + prettyPrintedRegionName + " from " + sn + " but the region isn't PENDING_OPEN/OPENING here: " + regionStates.getRegionState(encodedName)); if (regionState != null) { // Close it without updating the internal region states, // so as not to create double assignments in unlucky scenarios // mentioned in OpenRegionHandler#process unassign(regionState.getRegion(), null, -1, null, false, sn); } return; } {code} 7, master2 continue handle regioninfo when master1 stoped, found that 'region1' status in zk is opened, update status in memory to opened. 8, up to now, 'region1' status is opened on webpage of master status, but not opened on any regionserver. -- This message was sent by Atlassian JIRA (v6.3.4#6332)