Return-Path: X-Original-To: apmail-hbase-issues-archive@www.apache.org Delivered-To: apmail-hbase-issues-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id DE82790B9 for ; Tue, 8 May 2012 19:48:10 +0000 (UTC) Received: (qmail 72177 invoked by uid 500); 8 May 2012 19:48:10 -0000 Delivered-To: apmail-hbase-issues-archive@hbase.apache.org Received: (qmail 72127 invoked by uid 500); 8 May 2012 19:48:10 -0000 Mailing-List: contact issues-help@hbase.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Delivered-To: mailing list issues@hbase.apache.org Received: (qmail 72119 invoked by uid 99); 8 May 2012 19:48:10 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 08 May 2012 19:48:10 +0000 X-ASF-Spam-Status: No, hits=-2000.0 required=5.0 tests=ALL_TRUSTED,T_RP_MATCHES_RCVD X-Spam-Check-By: apache.org Received: from [140.211.11.116] (HELO hel.zones.apache.org) (140.211.11.116) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 08 May 2012 19:48:09 +0000 Received: from hel.zones.apache.org (hel.zones.apache.org [140.211.11.116]) by hel.zones.apache.org (Postfix) with ESMTP id 6819E43AB7C for ; Tue, 8 May 2012 19:47:49 +0000 (UTC) Date: Tue, 8 May 2012 19:47:49 +0000 (UTC) From: "stack (JIRA)" To: issues@hbase.apache.org Message-ID: <142146927.40681.1336506469427.JavaMail.tomcat@hel.zones.apache.org> In-Reply-To: <1911362584.16490.1335950570941.JavaMail.tomcat@hel.zones.apache.org> Subject: [jira] [Commented] (HBASE-5916) RS restart just before master intialization we make the cluster non operative MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 X-Virus-Checked: Checked by ClamAV on apache.org [ https://issues.apache.org/jira/browse/HBASE-5916?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13270764#comment-13270764 ] stack commented on HBASE-5916: ------------------------------ Why make this change? {code} - final Set onlineServers) + Set onlineServers) {code} Will alreadyOnlineSlowRS be a superset of onlineServers so you only need check it rather than both as in the below? {code} + } else if (!onlineServers.contains(regionLocation) + && !alreadyOnlineSlowRS.contains(regionLocation)) { {code} I'm not sure master time is what you want. What you want is the filesystem time, the time the namenode is using. I'm not sure but my guess would be that the modtime for files in hdfs would be set by the namenode; if a difference between master and namenode clocks, there could be a hole through which some WALs could slip if we use master time? What you think Ram? Otherwise, the patch is great. I love the test. > RS restart just before master intialization we make the cluster non operative > ----------------------------------------------------------------------------- > > Key: HBASE-5916 > URL: https://issues.apache.org/jira/browse/HBASE-5916 > Project: HBase > Issue Type: Bug > Affects Versions: 0.92.1, 0.94.0 > Reporter: ramkrishna.s.vasudevan > Assignee: ramkrishna.s.vasudevan > Priority: Critical > Fix For: 0.94.1 > > Attachments: HBASE-5916_trunk.patch, HBASE-5916_trunk_1.patch, HBASE-5916_trunk_1.patch, HBASE-5916_trunk_2.patch, HBASE-5916_trunk_3.patch > > > Consider a case where my master is getting restarted. RS that was alive when the master restart started, gets restarted before the master initializes the ServerShutDownHandler. > {code} > serverShutdownHandlerEnabled = true; > {code} > In this case when the RS tries to register with the master, the master will try to expire the server but the server cannot be expired as still the serverShutdownHandler is not enabled. > This case may happen when i have only one RS gets restarted or all the RS gets restarted at the same time.(before assignRootandMeta). > {code} > LOG.info(message); > if (existingServer.getStartcode() < serverName.getStartcode()) { > LOG.info("Triggering server recovery; existingServer " + > existingServer + " looks stale, new server:" + serverName); > expireServer(existingServer); > } > {code} > If another RS is brought up then the cluster comes back to normalcy. > May be a very corner case. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira