hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "stack (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-5849) On first cluster startup, RS aborts if root znode is not available
Date Tue, 24 Apr 2012 05:53:37 GMT

    [ https://issues.apache.org/jira/browse/HBASE-5849?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13260246#comment-13260246

stack commented on HBASE-5849:

There is something wrong now.  This test won't complete for me (though it has previous). 
I thought it the subsequent commit:

r1329555 | larsh | 2012-04-23 22:12:45 -0700 (Mon, 23 Apr 2012) | 1 line

Refuse operations from Admin before master is initialized - fix for all branches

..that was bringing on the problem but removing that, its still not completing.

I poked around in debugger and was getting an NPE in reportForDuty after master came up because
this.hbaseMaster was null; we were failing allocating the Interface (hard to trace because
toString would throw its on exception).

For now backing this out.
> On first cluster startup, RS aborts if root znode is not available
> ------------------------------------------------------------------
>                 Key: HBASE-5849
>                 URL: https://issues.apache.org/jira/browse/HBASE-5849
>             Project: HBase
>          Issue Type: Bug
>          Components: master, regionserver, zookeeper
>    Affects Versions: 0.92.2, 0.96.0, 0.94.1
>            Reporter: Enis Soztutar
>            Assignee: Enis Soztutar
>             Fix For: 0.92.2, 0.94.0
>         Attachments: 5849v3.txt, HBASE-5849_v1.patch, HBASE-5849_v2.patch
> When launching a fresh new cluster, the master has to be started first, which might create
race conditions for starting master and rs at the same time. 
> Master startup code is smt like this: 
>  - establish zk connection
>  - create root znodes in zk (/hbase)
>  - create ephemeral node for master /hbase/master, 
>  Region server start up code is smt like this: 
>  - establish zk connection
>  - check whether the root znode (/hbase) is there. If not, shutdown. 
>  - wait for the master to create znodes /hbase/master
> So, the problem is on the very first launch of the cluster, RS aborts to start since
/hbase znode might not have been created yet (only the master creates it if needed). Since
/hbase/ is not deleted on cluster shutdown, on subsequent cluster starts, it does not matter
which order the servers are started. So this affects only first launchs. 

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira


View raw message