Return-Path: X-Original-To: apmail-hbase-issues-archive@www.apache.org Delivered-To: apmail-hbase-issues-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 3E70B9E5D for ; Fri, 4 May 2012 16:49:14 +0000 (UTC) Received: (qmail 86245 invoked by uid 500); 4 May 2012 16:49:14 -0000 Delivered-To: apmail-hbase-issues-archive@hbase.apache.org Received: (qmail 86213 invoked by uid 500); 4 May 2012 16:49:14 -0000 Mailing-List: contact issues-help@hbase.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Delivered-To: mailing list issues@hbase.apache.org Received: (qmail 86166 invoked by uid 99); 4 May 2012 16:49:13 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 04 May 2012 16:49:13 +0000 X-ASF-Spam-Status: No, hits=-2000.0 required=5.0 tests=ALL_TRUSTED,T_RP_MATCHES_RCVD X-Spam-Check-By: apache.org Received: from [140.211.11.116] (HELO hel.zones.apache.org) (140.211.11.116) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 04 May 2012 16:49:11 +0000 Received: from hel.zones.apache.org (hel.zones.apache.org [140.211.11.116]) by hel.zones.apache.org (Postfix) with ESMTP id 79452430538 for ; Fri, 4 May 2012 16:48:50 +0000 (UTC) Date: Fri, 4 May 2012 16:48:50 +0000 (UTC) From: "Zhihong Yu (JIRA)" To: issues@hbase.apache.org Message-ID: <1129441608.27754.1336150130498.JavaMail.tomcat@hel.zones.apache.org> In-Reply-To: <1251028975.233.1335352082706.JavaMail.tomcat@hel.zones.apache.org> Subject: [jira] [Commented] (HBASE-5875) Process RIT and Master restart may remove an online server considering it as a dead server MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/HBASE-5875?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13268525#comment-13268525 ] Zhihong Yu commented on HBASE-5875: ----------------------------------- The following change is for debugging, right ? If so, please change log level accordingly: {code} + }catch(NotServingRegionException nsre){ + LOG.info("Failed verification of " + Bytes.toStringBinary(regionName) + + " at address=" + address + "; " + t); + throw nsre; {code} {code} + } catch (NotServingRegionException nsre) { + if(rit == true){ + // the root region location is available. {code} People unfamiliar with processRegionInTransitionAndBlockUntilAssigned() may get confused by the code above. rit actually means root region has come out of transition. So rit should be named accordingly. {code} + public void setServerShutdownHandlerEnabled(boolean setServerShutDownEnabled) { {code} The above method should be made package-private. Append 'ForTest' to the end of method name would help clarify its purpose. > Process RIT and Master restart may remove an online server considering it as a dead server > ------------------------------------------------------------------------------------------ > > Key: HBASE-5875 > URL: https://issues.apache.org/jira/browse/HBASE-5875 > Project: HBase > Issue Type: Bug > Affects Versions: 0.92.1 > Reporter: ramkrishna.s.vasudevan > Assignee: ramkrishna.s.vasudevan > Fix For: 0.94.1 > > Attachments: HBASE-5875.patch, HBASE-5875_0.94.patch > > > If on master restart it finds the ROOT/META to be in RIT state, master tries to assign the ROOT region through ProcessRIT. > Master will trigger the assignment and next will try to verify the Root Region Location. > Root region location verification is done seeing if the RS has the region in its online list. > If the master triggered assignment has not yet been completed in RS then the verify root region location will fail. > Because it failed > {code} > splitLogAndExpireIfOnline(currentRootServer); > {code} > we do split log and also remove the server from online server list. Ideally here there is nothing to do in splitlog as no region server was restarted. > So master, though the server is online, master just invalidates the region server. > In a special case, if i have only one RS then my cluster will become non operative. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira