Return-Path: X-Original-To: apmail-hbase-issues-archive@www.apache.org Delivered-To: apmail-hbase-issues-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 12813EBBA for ; Wed, 9 Jan 2013 02:24:13 +0000 (UTC) Received: (qmail 25779 invoked by uid 500); 9 Jan 2013 02:24:13 -0000 Delivered-To: apmail-hbase-issues-archive@hbase.apache.org Received: (qmail 25727 invoked by uid 500); 9 Jan 2013 02:24:13 -0000 Mailing-List: contact issues-help@hbase.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Delivered-To: mailing list issues@hbase.apache.org Received: (qmail 25371 invoked by uid 99); 9 Jan 2013 02:24:13 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 09 Jan 2013 02:24:13 +0000 Date: Wed, 9 Jan 2013 02:24:13 +0000 (UTC) From: "rajeshbabu (JIRA)" To: issues@hbase.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (HBASE-7504) -ROOT- may be offline forever after FullGC of RS MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/HBASE-7504?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13547547#comment-13547547 ] rajeshbabu commented on HBASE-7504: ----------------------------------- [~zjushch] Patch looks good. can we avoid calling server.getCatalogTracker().getRootLocation() (reading znode in zookeeper) two times in normal case? > -ROOT- may be offline forever after FullGC of RS > ------------------------------------------------- > > Key: HBASE-7504 > URL: https://issues.apache.org/jira/browse/HBASE-7504 > Project: HBase > Issue Type: Bug > Affects Versions: 0.94.3 > Reporter: chunhui shen > Assignee: chunhui shen > Fix For: 0.96.0 > > Attachments: 7504-trunk v1.patch, 7504-trunk v2.patch > > > 1.FullGC happen on ROOT regionserver. > 2.ZK session timeout, master expire the regionserver and submit to ServerShutdownHandler > 3.Regionserver complete the FullGC > 4.In the process of ServerShutdownHandler, verifyRootRegionLocation returns true > 5.ServerShutdownHandler skip assigning ROOT region > 6.Regionserver abort itself because it reveive YouAreDeadException after a regionserver report > 7.ROOT is offline now, and won't be assigned any more unless we restart master > Master Log: > {code} > 2012-10-31 19:51:39,043 DEBUG org.apache.hadoop.hbase.master.ServerManager: Added=dw88.kgb.sqa.cm4,60020,1351671478752 to dead servers, submitted shutdown handler to be executed, root=true, meta=false > 2012-10-31 19:51:39,045 INFO org.apache.hadoop.hbase.master.handler.ServerShutdownHandler: Splitting logs for dw88.kgb.sqa.cm4,60020,1351671478752 > 2012-10-31 19:51:50,113 INFO org.apache.hadoop.hbase.master.handler.ServerShutdownHandler: Server dw88.kgb.sqa.cm4,60020,1351671478752 was carrying ROOT. Trying to assign. > 2012-10-31 19:52:15,939 DEBUG org.apache.hadoop.hbase.master.ServerManager: Server REPORT rejected; currently processing dw88.kgb.sqa.cm4,60020,1351671478752 as dead server > 2012-10-31 19:52:15,945 INFO org.apache.hadoop.hbase.master.handler.ServerShutdownHandler: Skipping log splitting for dw88.kgb.sqa.cm4,60020,1351671478752 > {code} > No log of assigning ROOT > Regionserver log: > {code} > 2012-10-31 19:52:15,923 WARN org.apache.hadoop.hbase.util.Sleeper: We slept 229128ms instead of 100000ms, this is likely due to a long garbage collecting pause and it's usually bad, see http://hbase.apache.org/book.html#trouble.rs.runtime.zkexpired > {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira