Mailing-List: contact issues-help@hbase.apache.org; run by ezmlm
Precedence: bulk
Date: Wed, 9 Jan 2013 02:24:13 +0000 (UTC)
From: "rajeshbabu (JIRA)" <jira@apache.org>
To: issues@hbase.apache.org
Message-ID: <JIRA.12626319.1357551761710.101069.1357698253261@arcas>
In-Reply-To: <JIRA.12626319.1357551761710@arcas>
References: <JIRA.12626319.1357551761710@arcas>
Subject: [jira] [Commented] (HBASE-7504) -ROOT- may be offline forever after
 FullGC of  RS
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 7bit


    [ https://issues.apache.org/jira/browse/HBASE-7504?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13547547#comment-13547547 ] 

rajeshbabu commented on HBASE-7504:
-----------------------------------

[~zjushch] 
Patch looks good.
can we avoid calling server.getCatalogTracker().getRootLocation() (reading znode in zookeeper) two times in normal case?
                
> -ROOT- may be offline forever after FullGC of  RS
> -------------------------------------------------
>
>                 Key: HBASE-7504
>                 URL: https://issues.apache.org/jira/browse/HBASE-7504
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 0.94.3
>            Reporter: chunhui shen
>            Assignee: chunhui shen
>             Fix For: 0.96.0
>
>         Attachments: 7504-trunk v1.patch, 7504-trunk v2.patch
>
>
> 1.FullGC happen on ROOT regionserver.
> 2.ZK session timeout, master expire the regionserver and submit to ServerShutdownHandler
> 3.Regionserver complete the FullGC
> 4.In the process of ServerShutdownHandler, verifyRootRegionLocation returns true
> 5.ServerShutdownHandler skip assigning ROOT region
> 6.Regionserver abort itself because it reveive YouAreDeadException after a regionserver report
> 7.ROOT is offline now, and won't be assigned any more unless we restart master
> Master Log:
> {code}
> 2012-10-31 19:51:39,043 DEBUG org.apache.hadoop.hbase.master.ServerManager: Added=dw88.kgb.sqa.cm4,60020,1351671478752 to dead servers, submitted shutdown handler to be executed, root=true, meta=false
> 2012-10-31 19:51:39,045 INFO org.apache.hadoop.hbase.master.handler.ServerShutdownHandler: Splitting logs for dw88.kgb.sqa.cm4,60020,1351671478752
> 2012-10-31 19:51:50,113 INFO org.apache.hadoop.hbase.master.handler.ServerShutdownHandler: Server dw88.kgb.sqa.cm4,60020,1351671478752 was carrying ROOT. Trying to assign.
> 2012-10-31 19:52:15,939 DEBUG org.apache.hadoop.hbase.master.ServerManager: Server REPORT rejected; currently processing dw88.kgb.sqa.cm4,60020,1351671478752 as dead server
> 2012-10-31 19:52:15,945 INFO org.apache.hadoop.hbase.master.handler.ServerShutdownHandler: Skipping log splitting for dw88.kgb.sqa.cm4,60020,1351671478752
> {code}
> No log of assigning ROOT
> Regionserver log:
> {code}
> 2012-10-31 19:52:15,923 WARN org.apache.hadoop.hbase.util.Sleeper: We slept 229128ms instead of 100000ms, this is likely due to a long garbage collecting pause and it's usually bad, see http://hbase.apache.org/book.html#trouble.rs.runtime.zkexpired
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira