Return-Path: X-Original-To: apmail-hbase-issues-archive@www.apache.org Delivered-To: apmail-hbase-issues-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 7D2D111779 for ; Mon, 13 May 2013 21:31:16 +0000 (UTC) Received: (qmail 97894 invoked by uid 500); 13 May 2013 21:31:16 -0000 Delivered-To: apmail-hbase-issues-archive@hbase.apache.org Received: (qmail 97859 invoked by uid 500); 13 May 2013 21:31:16 -0000 Mailing-List: contact issues-help@hbase.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Delivered-To: mailing list issues@hbase.apache.org Received: (qmail 97850 invoked by uid 99); 13 May 2013 21:31:16 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 13 May 2013 21:31:16 +0000 Date: Mon, 13 May 2013 21:31:16 +0000 (UTC) From: "Jean-Daniel Cryans (JIRA)" To: issues@hbase.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (HBASE-8519) Backup master will never come up if primary master dies during initialization MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/HBASE-8519?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13656409#comment-13656409 ] Jean-Daniel Cryans commented on HBASE-8519: ------------------------------------------- bq. + if (clusterShutDown.get() && !clusterStatusTracker.isClusterUp()) { Do we still need the second statement? Also I'm not a fan of clusterShutDown's name since the cluster could be down when the master starts, or at least we need to doc more to explain why we don't care if the cluster is down when we start. > Backup master will never come up if primary master dies during initialization > ----------------------------------------------------------------------------- > > Key: HBASE-8519 > URL: https://issues.apache.org/jira/browse/HBASE-8519 > Project: HBase > Issue Type: Bug > Components: master > Affects Versions: 0.94.7, 0.95.0 > Reporter: Jerry He > Assignee: Jerry He > Priority: Minor > Fix For: 0.98.0 > > Attachments: HBASE-8519-trunk.patch > > > The problem happens if primary master dies after becoming master but before it completes initialization and calls clusterStatusTracker.setClusterUp(), > The backup master will try to become the master, but will shutdown itself promptly because it sees 'the cluster is not up'. > This is the backup master log: > 2013-05-09 15:08:05,568 INFO org.apache.hadoop.hbase.master.metrics.MasterMetrics: Initialized > 2013-05-09 15:08:05,573 DEBUG org.apache.hadoop.hbase.master.HMaster: HMaster started in backup mode. Stalling until master znode is written. > 2013-05-09 15:08:05,589 INFO org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper: Node /hbase/master already exists and this is not a retry > 2013-05-09 15:08:05,590 INFO org.apache.hadoop.hbase.master.ActiveMasterManager: Adding ZNode for /hbase/backup-masters/xxx.com,60000,1368137285373 in backup master directory > 2013-05-09 15:08:05,595 INFO org.apache.hadoop.hbase.master.ActiveMasterManager: Another master is the active master, xxx.com,60000,1368137283107; waiting to become the next active master > 2013-05-09 15:09:45,006 DEBUG org.apache.hadoop.hbase.master.ActiveMasterManager: No master available. Notifying waiting threads > 2013-05-09 15:09:45,006 INFO org.apache.hadoop.hbase.master.HMaster: Cluster went down before this master became active > 2013-05-09 15:09:45,006 DEBUG org.apache.hadoop.hbase.master.HMaster: Stopping service threads > 2013-05-09 15:09:45,006 INFO org.apache.hadoop.ipc.HBaseServer: Stopping server on 60000 > > In ActiveMasterManager::blockUntilBecomingActiveMaster() > {code} > .. > if (!clusterStatusTracker.isClusterUp()) { > this.master.stop( > "Cluster went down before this master became active"); > } > .. > {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira