Return-Path: X-Original-To: apmail-hbase-issues-archive@www.apache.org Delivered-To: apmail-hbase-issues-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 67DF562D5 for ; Thu, 30 Jun 2011 13:02:51 +0000 (UTC) Received: (qmail 18473 invoked by uid 500); 30 Jun 2011 13:02:51 -0000 Delivered-To: apmail-hbase-issues-archive@hbase.apache.org Received: (qmail 18407 invoked by uid 500); 30 Jun 2011 13:02:50 -0000 Mailing-List: contact issues-help@hbase.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Delivered-To: mailing list issues@hbase.apache.org Received: (qmail 18399 invoked by uid 99); 30 Jun 2011 13:02:50 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 30 Jun 2011 13:02:50 +0000 X-ASF-Spam-Status: No, hits=-2000.0 required=5.0 tests=ALL_TRUSTED,T_RP_MATCHES_RCVD X-Spam-Check-By: apache.org Received: from [140.211.11.116] (HELO hel.zones.apache.org) (140.211.11.116) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 30 Jun 2011 13:02:48 +0000 Received: from hel.zones.apache.org (hel.zones.apache.org [140.211.11.116]) by hel.zones.apache.org (Postfix) with ESMTP id 67DB243C647 for ; Thu, 30 Jun 2011 13:02:28 +0000 (UTC) Date: Thu, 30 Jun 2011 13:02:28 +0000 (UTC) From: "Ted Yu (JIRA)" To: issues@hbase.apache.org Message-ID: <1462116090.5363.1309438948422.JavaMail.tomcat@hel.zones.apache.org> In-Reply-To: <600513184.31553.1304907843123.JavaMail.tomcat@hel.zones.apache.org> Subject: [jira] [Updated] (HBASE-3867) when cluster is stopped and server which hosted meta region is removed from cluster, master breaks down after restarting cluster. MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/HBASE-3867?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ted Yu updated HBASE-3867: -------------------------- Attachment: 3867-trunk.txt TRUNK version of the patch. Now DisabledTestRegionServerExit passes. > when cluster is stopped and server which hosted meta region is removed from cluster, master breaks down after restarting cluster. > --------------------------------------------------------------------------------------------------------------------------------- > > Key: HBASE-3867 > URL: https://issues.apache.org/jira/browse/HBASE-3867 > Project: HBase > Issue Type: Bug > Components: master > Affects Versions: 0.90.1, 0.90.2 > Reporter: Liu Jia > Priority: Critical > Fix For: 0.90.2 > > Attachments: 3867-trunk.txt, CatalogTracker.patch > > Original Estimate: 24h > Remaining Estimate: 24h > > When cluster stopped and romove server from cluster which contains meta region, then restart cluster, > From the following code throws "NoRouteToHostException" > package org.apache.hadoop.hbase.catalog; > public class CatalogTracker > private HRegionInterface getMetaServerConnection(boolean refresh) > throws IOException, InterruptedException { > synchronized (metaAvailable) { > if (metaAvailable.get()) { > HRegionInterface current = getCachedConnection(metaLocation); > if (!refresh) { > return current; > } > if (verifyRegionLocation(current, this.metaLocation, META_REGION)) { > return current; > } > resetMetaLocation(); > } > HRegionInterface rootConnection = getRootServerConnection(); > if (rootConnection == null) { > return null; > } > HServerAddress newLocation = MetaReader.readMetaLocation(rootConnection); > if (newLocation == null) { > return null; > } > ////////the following line throws the exception > HRegionInterface newConnection = getCachedConnection(newLocation); > if (verifyRegionLocation(newConnection, this.metaLocation, META_REGION)) { > setMetaLocation(newLocation); > return newConnection; > } > return null; > } > } > /////////////the following method don't handle the exception. > public class CatalogTracker > public boolean verifyMetaRegionLocation(final long timeout) > throws InterruptedException, IOException { > return getMetaServerConnection(true) != null; > } > //////////////////master call the CatalogTracker's method and don't handle the problem too. > package org.apache.hadoop.hbase.master; > public class HMaster > int assignRootAndMeta() > throws InterruptedException, IOException, KeeperException { > int assigned = 0; > long timeout = this.conf.getLong("hbase.catalog.verification.timeout", 1000); > // Work on ROOT region. Is it in zk in transition? > boolean rit = this.assignmentManager. > processRegionInTransitionAndBlockUntilAssigned(HRegionInfo.ROOT_REGIONINFO); > if (!catalogTracker.verifyRootRegionLocation(timeout)) { > this.assignmentManager.assignRoot(); > this.catalogTracker.waitForRoot(); > assigned++; > } > LOG.info("-ROOT- assigned=" + assigned + ", rit=" + rit + > ", location=" + catalogTracker.getRootLocation()); > // Work on meta region > rit = this.assignmentManager. > processRegionInTransitionAndBlockUntilAssigned(HRegionInfo.FIRST_META_REGIONINFO); > /////////////////////////////// > when restart cluster master break down here. > //////////////////////////////// > if (!this.catalogTracker.verifyMetaRegionLocation(timeout)) { > this.assignmentManager.assignMeta(); > this.catalogTracker.waitForMeta(); > // Above check waits for general meta availability but this does not > // guarantee that the transition has completed > this.assignmentManager.waitForAssignment(HRegionInfo.FIRST_META_REGIONINFO); > assigned++; > } > LOG.info(".META. assigned=" + assigned + ", rit=" + rit + > ", location=" + catalogTracker.getMetaLocation()); > return assigned; > } > Thanks to JunQiang Yuan in www.alipay.com for providing information about this bug. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira