hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Maryann Xue (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-6289) ROOT region doesn't get re-assigned in ServerShutdownHandler if the RS is still working but only the RS's ZK node expires.
Date Thu, 28 Jun 2012 13:57:43 GMT

    [ https://issues.apache.org/jira/browse/HBASE-6289?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13403106#comment-13403106
] 

Maryann Xue commented on HBASE-6289:
------------------------------------

@ramkrishna: Yes, i thought of this too. but i saw this comment here before verifyAndAssignRoot():
"Before assign the ROOT region, ensure it haven't been assigned by other place". Not sure
if this "ROOT assigned elsewhere" situation will actually possibly occur, but we seem to have
seen META assigned on several Region Servers at the same time when there was chaos going on
in our lab's network. There can be only one single search path for any region (incl. meta
and root), though, regardless of client cache. And this is the thing i don't understand, why
we try to treat ROOT differently?
                
> ROOT region doesn't get re-assigned in ServerShutdownHandler if the RS is still working
but only the RS's ZK node expires.
> --------------------------------------------------------------------------------------------------------------------------
>
>                 Key: HBASE-6289
>                 URL: https://issues.apache.org/jira/browse/HBASE-6289
>             Project: HBase
>          Issue Type: Bug
>          Components: master
>    Affects Versions: 0.90.6, 0.94.0
>            Reporter: Maryann Xue
>            Assignee: Maryann Xue
>            Priority: Critical
>         Attachments: HBASE-6289.patch
>
>
> The ROOT RS has some network problem and its ZK node expires first, which kicks off the
ServerShutdownHandler. it calls verifyAndAssignRoot() to try to re-assign ROOT. At that time,
the RS is actually still working and passes the verifyRootRegionLocation() check, so the ROOT
region is skipped from re-assignment.
>   private void verifyAndAssignRoot()
>   throws InterruptedException, IOException, KeeperException {
>     long timeout = this.server.getConfiguration().
>       getLong("hbase.catalog.verification.timeout", 1000);
>     if (!this.server.getCatalogTracker().verifyRootRegionLocation(timeout)) {
>       this.services.getAssignmentManager().assignRoot();
>     }
>   }
> After a few moments, this RS encounters DFS write problem and decides to abort. The RS
then soon gets restarted from commandline, and constantly report:
> 2012-06-27 23:13:08,627 DEBUG org.apache.hadoop.hbase.regionserver.HRegionServer: NotServingRegionException;
Region is not online: -ROOT-,,0
> 2012-06-27 23:13:08,627 DEBUG org.apache.hadoop.hbase.regionserver.HRegionServer: NotServingRegionException;
Region is not online: -ROOT-,,0
> 2012-06-27 23:13:08,628 DEBUG org.apache.hadoop.hbase.regionserver.HRegionServer: NotServingRegionException;
Region is not online: -ROOT-,,0
> 2012-06-27 23:13:08,628 DEBUG org.apache.hadoop.hbase.regionserver.HRegionServer: NotServingRegionException;
Region is not online: -ROOT-,,0
> 2012-06-27 23:13:08,630 DEBUG org.apache.hadoop.hbase.regionserver.HRegionServer: NotServingRegionException;
Region is not online: -ROOT-,,0

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Mime
View raw message