hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Matteo Bertozzi (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-13935) Orphaned namespace table ZK node should not prevent master to start
Date Thu, 18 Jun 2015 19:11:01 GMT

    [ https://issues.apache.org/jira/browse/HBASE-13935?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14592351#comment-14592351

Matteo Bertozzi commented on HBASE-13935:

{quote}If we have an orphaned ENABLING znode, before HMaster#initNamespace() was called, "this.assignmentManager.joinCluster();"
was executed, which would call "AssignmentManager#recoverTableInEnablingState()" to remove
the ENABLING znode. That is why my unit test only set to ENABLED and my guess is the orphaned
znode in the test probably has ENABLED znode.{quote}
the znode state is fine, what I don't know (sorry I haven't look at the code yet) is what
happens if we keep going and we have already some state on disk. i know that if are in the
same situation of the unit test everything is fine, but is that a real situation? can we end
up with some data in the dir abort, restart the master skip the znode check and now adding
another region in the systable which will case hbck to complain? 
if the above is not possible, patch is good. 

in proc-v2 the AM recoverTableInState() is still there, and it does the wrong thing for us.
I think there is a jira to remove that.

> Orphaned namespace table ZK node should not prevent master to start
> -------------------------------------------------------------------
>                 Key: HBASE-13935
>                 URL: https://issues.apache.org/jira/browse/HBASE-13935
>             Project: HBase
>          Issue Type: Bug
>          Components: master
>    Affects Versions: 1.0.0, 0.98.13
>            Reporter: Stephen Yuan Jiang
>            Assignee: Stephen Yuan Jiang
>             Fix For: 0.98.14, 1.0.2
>         Attachments: HBASE-13935.v1-0.98.patch, HBASE-13935.v1-branch-1.0.patch
> Before we have the state-of-art Procedure V2 feature (HBASE 1.0 release or older), we
frequently see the following issue (orphaned ZK node) that prevent master to start (at least
in testing):
> {noformat}
> 2015-06-16 17:54:36,472 FATAL [master:] master.HMaster: Unhandled exception.
Starting shutdown.
> org.apache.hadoop.hbase.TableExistsException: hbase:namespace
> 	at org.apache.hadoop.hbase.master.handler.CreateTableHandler.prepare(CreateTableHandler.java:137)
> 	at org.apache.hadoop.hbase.master.TableNamespaceManager.createNamespaceTable(TableNamespaceManager.java:232)
> 	at org.apache.hadoop.hbase.master.TableNamespaceManager.start(TableNamespaceManager.java:86)
> 	at org.apache.hadoop.hbase.master.HMaster.initNamespace(HMaster.java:1123)
> 	at org.apache.hadoop.hbase.master.HMaster.finishInitialization(HMaster.java:947)
> 	at org.apache.hadoop.hbase.master.HMaster.run(HMaster.java:618)
> 	at java.lang.Thread.run(Thread.java:745)
> 2015-06-16 17:54:36,472 INFO  [master:] master.HMaster: Aborting
> {noformat}
> The above call trace is from a 0.98.x test run.  We saw similar issue in 1.0.x run, too.
> The proposed fix is to ignore the zk node and force namespace table creation to be complete
so that master can start successfully.  

This message was sent by Atlassian JIRA

View raw message