Return-Path: X-Original-To: apmail-hbase-issues-archive@www.apache.org Delivered-To: apmail-hbase-issues-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id DDC2810775 for ; Wed, 16 Oct 2013 22:18:20 +0000 (UTC) Received: (qmail 34116 invoked by uid 500); 16 Oct 2013 22:13:05 -0000 Delivered-To: apmail-hbase-issues-archive@hbase.apache.org Received: (qmail 33963 invoked by uid 500); 16 Oct 2013 22:12:58 -0000 Mailing-List: contact issues-help@hbase.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Delivered-To: mailing list issues@hbase.apache.org Received: (qmail 33863 invoked by uid 99); 16 Oct 2013 22:12:49 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 16 Oct 2013 22:12:49 +0000 Date: Wed, 16 Oct 2013 22:12:49 +0000 (UTC) From: "Jeffrey Zhong (JIRA)" To: issues@hbase.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Reopened] (HBASE-9773) Master aborted when hbck asked the master to assign a region that was already online MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/HBASE-9773?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jeffrey Zhong reopened HBASE-9773: ---------------------------------- > Master aborted when hbck asked the master to assign a region that was already online > ------------------------------------------------------------------------------------ > > Key: HBASE-9773 > URL: https://issues.apache.org/jira/browse/HBASE-9773 > Project: HBase > Issue Type: Bug > Reporter: Devaraj Das > Assignee: Jimmy Xiang > Fix For: 0.98.0, 0.96.1 > > Attachments: trunk-9773.patch, trunk-9773_v2.patch > > > Came across this situation (with a version of 0.96 very close to RC5 version created on 10/11): > The sequence of events that happened: > 1. The hbck tool couldn't communicate with the RegionServer hosting namespace region due to some security exceptions. hbck INCORRECTLY assumed the region was not deployed. > In output.log (client side): > {noformat} > 2013-10-12 10:42:57,067|beaver.machine|INFO|ERROR: Region { meta => hbase:namespace,,1381564449706.a0ac0825ba2d0830614e7f808f31787a., hdfs => hdfs://gs-hdp2-secure-1381559462-hbase-12.cs1cloud.internal:8020/apps/hbase/data/data/hbase/namespace/a0ac0825ba2d0830614e7f808f31787a, deployed => } not deployed on any region server. > 2013-10-12 10:42:57,067|beaver.machine|INFO|Trying to fix unassigned region... > {noformat} > 2. This led to the hbck tool trying to tell the master to "assign" the region. > In master log (hbase-hbase-master-gs-hdp2-secure-1381559462-hbase-12.log): > {noformat} > 2013-10-12 10:52:35,960 INFO [RpcServer.handler=4,port=60000] master.HMaster: Client=hbase//172.18.145.105 assign hbase:namespace,,1381564449706.a0ac0825ba2d0830614e7f808f31787a. > {noformat} > 3. The master went through the steps - sent a CLOSE to the RegionServer hosting namespace region. > From master log: > {noformat} > 2013-10-12 10:52:35,981 DEBUG [RpcServer.handler=4,port=60000] master.AssignmentManager: Sent CLOSE to gs-hdp2-secure-1381559462-hbase-1.cs1cloud.internal,60020,1381564439794 for region hbase:namespace,,1381564449706.a0ac0825ba2d0830614e7f808f31787a. > {noformat} > 4. The master then tried to assign the namespace region to a region server, and in the process ABORTED: > From master log: > {noformat} > 2013-10-12 10:52:36,025 DEBUG [RpcServer.handler=4,port=60000] master.AssignmentManager: No previous transition plan found (or ignoring an existing plan) for hbase:namespace,,1381564449706.a0ac0825ba2d0830614e7f808f31787a.; generated random plan=hri=hbase:namespace,,1381564449706.a0ac0825ba2d0830614e7f808f31787a., src=, dest=gs-hdp2-secure-1381559462-hbase-9.cs1cloud.internal,60020,1381564439807; 4 (online=4, available=4) available servers, forceNewPlan=true > 2013-10-12 10:52:36,026 FATAL [RpcServer.handler=4,port=60000] master.HMaster: Master server abort: loaded coprocessors are: [org.apache.hadoop.hbase.security.access.AccessController] > 2013-10-12 10:52:36,027 FATAL [RpcServer.handler=4,port=60000] master.HMaster: Unexpected state : {a0ac0825ba2d0830614e7f808f31787a state=OPEN, ts=1381564451344, server=gs-hdp2-secure-1381559462-hbase-1.cs1cloud.internal,60020,1381564439794} .. Cannot transit it to OFFLINE. > java.lang.IllegalStateException: Unexpected state : {a0ac0825ba2d0830614e7f808f31787a state=OPEN, ts=1381564451344, server=gs-hdp2-secure-1381559462-hbase-1.cs1cloud.internal,60020,1381564439794} .. Cannot transit it to OFFLINE. > {noformat} > {code}AssignmentManager.assign(HRegionInfo region, boolean setOfflineInZK, boolean forceNewPlan){code} is the method that does all the above. This was called from the HMaster with true for both the boolean arguments. -- This message was sent by Atlassian JIRA (v6.1#6144)