hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Enis Soztutar (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (HBASE-5849) On first cluster startup, RS aborts if root znode is not available
Date Wed, 25 Apr 2012 01:44:07 GMT

     [ https://issues.apache.org/jira/browse/HBASE-5849?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Enis Soztutar updated HBASE-5849:
---------------------------------

    Attachment: HBASE-5849_v4.patch
                HBASE-5849_v4-0.92.patch
                HBASE-5849_v4.patch

I have found 2 issues, that caused timeouts in 0.92 branch: 
1. hbase dir was not setup to use the temp dir under target/, but used the default one under
/tmp/hadoop-${username}, so running the  test on 0.92 causes rs to not come up if you have
dirty data under /tmp/. 
2. giving timeouts like @Test(timeout=xxx) causes 0.92 master to not shutdown properly. I
could not inspect this further, there might be an issue with surefire. 

As a result, I updated the patch to first boot up a mini dfs, and setup the hbase dir. And
I also removed the timeouts (the test runner (maven) will timeout instead if something goes
wrong).

All my tests for trunk,0.94, and 0.92 seem to pass.  

@Ted, @Stack, can you please try the patch to see whether you can replicate?

On an unrelated note, the ResourceChecker notifies that some of the daemon threads (like LruBlockCache.EvictionThread)
are not shutdown properly (even when using MiniHBaseCluster, and shutting down properly).
Any idea, whether we should dig into that?
                
> On first cluster startup, RS aborts if root znode is not available
> ------------------------------------------------------------------
>
>                 Key: HBASE-5849
>                 URL: https://issues.apache.org/jira/browse/HBASE-5849
>             Project: HBase
>          Issue Type: Bug
>          Components: master, regionserver, zookeeper
>    Affects Versions: 0.92.2, 0.96.0, 0.94.1
>            Reporter: Enis Soztutar
>            Assignee: Enis Soztutar
>             Fix For: 0.92.2, 0.94.0
>
>         Attachments: 5849v3.txt, HBASE-5849_v1.patch, HBASE-5849_v2.patch, HBASE-5849_v4-0.92.patch,
HBASE-5849_v4.patch, HBASE-5849_v4.patch, HBASE-5849_v4.patch
>
>
> When launching a fresh new cluster, the master has to be started first, which might create
race conditions for starting master and rs at the same time. 
> Master startup code is smt like this: 
>  - establish zk connection
>  - create root znodes in zk (/hbase)
>  - create ephemeral node for master /hbase/master, 
>  Region server start up code is smt like this: 
>  - establish zk connection
>  - check whether the root znode (/hbase) is there. If not, shutdown. 
>  - wait for the master to create znodes /hbase/master
> So, the problem is on the very first launch of the cluster, RS aborts to start since
/hbase znode might not have been created yet (only the master creates it if needed). Since
/hbase/ is not deleted on cluster shutdown, on subsequent cluster starts, it does not matter
which order the servers are started. So this affects only first launchs. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Mime
View raw message