Return-Path: X-Original-To: apmail-hbase-issues-archive@www.apache.org Delivered-To: apmail-hbase-issues-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id ED4A997C2 for ; Thu, 24 May 2012 22:00:04 +0000 (UTC) Received: (qmail 37488 invoked by uid 500); 24 May 2012 22:00:04 -0000 Delivered-To: apmail-hbase-issues-archive@hbase.apache.org Received: (qmail 37428 invoked by uid 500); 24 May 2012 22:00:04 -0000 Mailing-List: contact issues-help@hbase.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Delivered-To: mailing list issues@hbase.apache.org Received: (qmail 37418 invoked by uid 99); 24 May 2012 22:00:04 -0000 Received: from issues-vm.apache.org (HELO issues-vm) (140.211.11.160) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 24 May 2012 22:00:04 +0000 Received: from isssues-vm.apache.org (localhost [127.0.0.1]) by issues-vm (Postfix) with ESMTP id 92A661418B6 for ; Thu, 24 May 2012 22:00:04 +0000 (UTC) Date: Thu, 24 May 2012 21:59:44 +0000 (UTC) From: "Zhihong Yu (JIRA)" To: issues@hbase.apache.org Message-ID: <2109526028.154.1337896804620.JavaMail.jiratomcat@issues-vm> In-Reply-To: <1911362584.16490.1335950570941.JavaMail.tomcat@hel.zones.apache.org> Subject: [jira] [Commented] (HBASE-5916) RS restart just before master intialization we make the cluster non operative MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/HBASE-5916?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13282808#comment-13282808 ] Zhihong Yu commented on HBASE-5916: ----------------------------------- The failure in TestClockSkewDetection was due to NPE. The following change makes it pass: {code} if ((this.services == null || ((HMaster) this.services).isInitialized()) && this.deadservers.cleanPreviousInstance(serverName)) { {code} {code} + * To clear any dead server with same host name and port of online server {code} I think 'any' should be added in front of 'online server'. {code} + public void clearDeadServersWithSameHostNameAndPortOfOnlineServer() { {code} The above method can be package private, right ? {code} + while ((sn = ServerName.findServerWithSameHostnamePort(this.deadservers, serverName)) != null) { {code} The above line exceeds 100 chars. {code} + if(actualDeadServers.contains(deadServer.getKey())){ {code} Add spaces after if and before {. > RS restart just before master intialization we make the cluster non operative > ----------------------------------------------------------------------------- > > Key: HBASE-5916 > URL: https://issues.apache.org/jira/browse/HBASE-5916 > Project: HBase > Issue Type: Bug > Affects Versions: 0.92.1, 0.94.0 > Reporter: ramkrishna.s.vasudevan > Assignee: ramkrishna.s.vasudevan > Priority: Critical > Fix For: 0.94.1 > > Attachments: HBASE-5916_trunk.patch, HBASE-5916_trunk_1.patch, HBASE-5916_trunk_1.patch, HBASE-5916_trunk_2.patch, HBASE-5916_trunk_3.patch, HBASE-5916_trunk_4.patch, HBASE-5916_trunk_v5.patch, HBASE-5916_trunk_v6.patch > > > Consider a case where my master is getting restarted. RS that was alive when the master restart started, gets restarted before the master initializes the ServerShutDownHandler. > {code} > serverShutdownHandlerEnabled = true; > {code} > In this case when the RS tries to register with the master, the master will try to expire the server but the server cannot be expired as still the serverShutdownHandler is not enabled. > This case may happen when i have only one RS gets restarted or all the RS gets restarted at the same time.(before assignRootandMeta). > {code} > LOG.info(message); > if (existingServer.getStartcode() < serverName.getStartcode()) { > LOG.info("Triggering server recovery; existingServer " + > existingServer + " looks stale, new server:" + serverName); > expireServer(existingServer); > } > {code} > If another RS is brought up then the cluster comes back to normalcy. > May be a very corner case. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira