hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "stack (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-6289) ROOT region doesn't get re-assigned in ServerShutdownHandler if the RS is still working but only the RS's ZK node expires.
Date Fri, 29 Jun 2012 20:25:44 GMT

    [ https://issues.apache.org/jira/browse/HBASE-6289?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13404175#comment-13404175
] 

stack commented on HBASE-6289:
------------------------------

The verify of root was added by:

{code}
Revision 1127158 - (view) (download) (annotate) - [select for diffs] 
Modified Tue May 24 17:25:42 2011 UTC (13 months ago) by stack 
Original Path: hbase/trunk/src/main/java/org/apache/hadoop/hbase/master/handler/ServerShutdownHandler.java

File length: 13616 byte(s) 
Diff to previous 1097275 (colored)
HBASE-3914 ROOT region appeared in two regionserver's onlineRegions at the same time
{code}

It was added by Jieshan to narrow window where SSH and new master startup race each other.

Meta was never 'verified'.  SSH just went ahead and assign .META. if we are processing the
server that was hosting .META. .... since SSH was created.

So, I'd say, if we are to preserve Jieshan's fix, we need Maryann's patch as is?  I'm +1 on
commit.
                
> ROOT region doesn't get re-assigned in ServerShutdownHandler if the RS is still working
but only the RS's ZK node expires.
> --------------------------------------------------------------------------------------------------------------------------
>
>                 Key: HBASE-6289
>                 URL: https://issues.apache.org/jira/browse/HBASE-6289
>             Project: HBase
>          Issue Type: Bug
>          Components: master
>    Affects Versions: 0.90.6, 0.94.0
>            Reporter: Maryann Xue
>            Assignee: Maryann Xue
>            Priority: Critical
>         Attachments: HBASE-6289.patch
>
>
> The ROOT RS has some network problem and its ZK node expires first, which kicks off the
ServerShutdownHandler. it calls verifyAndAssignRoot() to try to re-assign ROOT. At that time,
the RS is actually still working and passes the verifyRootRegionLocation() check, so the ROOT
region is skipped from re-assignment.
> {code}
>   private void verifyAndAssignRoot()
>   throws InterruptedException, IOException, KeeperException {
>     long timeout = this.server.getConfiguration().
>       getLong("hbase.catalog.verification.timeout", 1000);
>     if (!this.server.getCatalogTracker().verifyRootRegionLocation(timeout)) {
>       this.services.getAssignmentManager().assignRoot();
>     }
>   }
> {code}
> After a few moments, this RS encounters DFS write problem and decides to abort. The RS
then soon gets restarted from commandline, and constantly report:
> {code}
> 2012-06-27 23:13:08,627 DEBUG org.apache.hadoop.hbase.regionserver.HRegionServer: NotServingRegionException;
Region is not online: -ROOT-,,0
> 2012-06-27 23:13:08,627 DEBUG org.apache.hadoop.hbase.regionserver.HRegionServer: NotServingRegionException;
Region is not online: -ROOT-,,0
> 2012-06-27 23:13:08,628 DEBUG org.apache.hadoop.hbase.regionserver.HRegionServer: NotServingRegionException;
Region is not online: -ROOT-,,0
> 2012-06-27 23:13:08,628 DEBUG org.apache.hadoop.hbase.regionserver.HRegionServer: NotServingRegionException;
Region is not online: -ROOT-,,0
> 2012-06-27 23:13:08,630 DEBUG org.apache.hadoop.hbase.regionserver.HRegionServer: NotServingRegionException;
Region is not online: -ROOT-,,0
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Mime
View raw message