Return-Path: X-Original-To: apmail-hbase-issues-archive@www.apache.org Delivered-To: apmail-hbase-issues-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 6F57D10CF2 for ; Mon, 17 Nov 2014 08:08:34 +0000 (UTC) Received: (qmail 52565 invoked by uid 500); 17 Nov 2014 08:08:34 -0000 Delivered-To: apmail-hbase-issues-archive@hbase.apache.org Received: (qmail 52517 invoked by uid 500); 17 Nov 2014 08:08:34 -0000 Mailing-List: contact issues-help@hbase.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Delivered-To: mailing list issues@hbase.apache.org Received: (qmail 52506 invoked by uid 99); 17 Nov 2014 08:08:34 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 17 Nov 2014 08:08:34 +0000 Date: Mon, 17 Nov 2014 08:08:34 +0000 (UTC) From: "Nick Dimiduk (JIRA)" To: issues@hbase.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Updated] (HBASE-12467) Master joins cluster but never completes initialization MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/HBASE-12467?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nick Dimiduk updated HBASE-12467: --------------------------------- Attachment: HBASE-12467.01.patch Thanks Stack, that's exactly what I was looking for. > Master joins cluster but never completes initialization > ------------------------------------------------------- > > Key: HBASE-12467 > URL: https://issues.apache.org/jira/browse/HBASE-12467 > Project: HBase > Issue Type: Bug > Components: master > Reporter: Nick Dimiduk > Assignee: Nick Dimiduk > Fix For: 2.0.0, 0.98.9, 0.99.2 > > Attachments: HBASE-12467.00.patch, HBASE-12467.00.patch, HBASE-12467.01.patch > > > While diagnosing a rare failure in IntegrationTestLoadAndVerify, I discovered this scenario. Master was restarted by CM. Upon rejoining the cluster it successfully assumes responsibility as active master, but apparently the finishInitialization method never completes. The last log line from that thread is > {noformat} > 2014-11-10 17:01:29,940 INFO [master:ip-172-31-9-135:60000] master.HMaster: hbase:meta with replicaId 0 assigned=0, rit=false, location=ip-172-31-9-136.ec2.internal,60020,1415638551951 > {noformat} > I see region states populated from existing znodes. AM inventoried the online regions, acknowledged that this was master failover. There it sits, responding to RPC's with {{PleaseHoldException: Master is initializing}}. > For the sake of resiliency, we should detect this scenario and at least release control as active master. -- This message was sent by Atlassian JIRA (v6.3.4#6332)