hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jean-Daniel Cryans (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-5926) Delete the master znode after a master crash
Date Thu, 29 Nov 2012 21:14:59 GMT

    [ https://issues.apache.org/jira/browse/HBASE-5926?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13506801#comment-13506801

Jean-Daniel Cryans commented on HBASE-5926:

This jira has the odd side-effect of printing out a lot of garbage when running in standalone
and killing it with -9, gist of it being:

2012-11-29 13:08:27,227 WARN org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper: Possibly
transient ZooKeeper exception: org.apache.zookeeper.KeeperException$ConnectionLossException:
KeeperErrorCode = ConnectionLoss for /hbase/master
2012-11-29 13:08:27,227 ERROR org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper: ZooKeeper
getData failed after 0 retries
2012-11-29 13:08:27,227 WARN org.apache.hadoop.hbase.zookeeper.ZKUtil: clean znode for master
Unable to get data of znode /hbase/master
org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss
for /hbase/master
        at org.apache.zookeeper.KeeperException.create(KeeperException.java:99)
        at org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
        at org.apache.zookeeper.ZooKeeper.getData(ZooKeeper.java:1131)
        at org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper.getData(RecoverableZooKeeper.java:291)
        at org.apache.hadoop.hbase.zookeeper.ZKUtil.getDataNoWatch(ZKUtil.java:562)
        at org.apache.hadoop.hbase.zookeeper.MasterAddressTracker.deleteIfEquals(MasterAddressTracker.java:168)
        at org.apache.hadoop.hbase.ZNodeClearer.clear(ZNodeClearer.java:150)
        at org.apache.hadoop.hbase.master.HMasterCommandLine.run(HMasterCommandLine.java:110)
        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
        at org.apache.hadoop.hbase.util.ServerCommandLine.doMain(ServerCommandLine.java:78)
        at org.apache.hadoop.hbase.master.HMaster.main(HMaster.java:2298)

Basically the znode cleaner fails hard because ZK is offline.

I was confused to see more logs being printed out after running the kill.
> Delete the master znode after a master crash
> --------------------------------------------
>                 Key: HBASE-5926
>                 URL: https://issues.apache.org/jira/browse/HBASE-5926
>             Project: HBase
>          Issue Type: Improvement
>          Components: master, scripts
>    Affects Versions: 0.96.0
>            Reporter: nkeywal
>            Assignee: nkeywal
>            Priority: Minor
>             Fix For: 0.96.0
>         Attachments: 5926.v10.patch, 5926.v11.patch, 5926.v13.patch, 5926.v14.patch,
5926.v6.patch, 5926.v8.patch, 5926.v9.patch
> This is the continuation of the work done in HBASE-5844.
> But we can't apply exactly the same strategy: for the region server, there is a znode
per region server, while for the master & backup master there is a single znode for both.
> So if we apply the same strategy as for a regionserver, we may have this scenario:
> 1) Master starts
> 2) Backup master starts
> 3) Master dies
> 4) ZK detects it
> 5) Backup master receives the update from ZK
> 6) Backup master creates the new master node and become the main master
> 7) Previous master script continues
> 8) Previous master script deletes the master node in ZK
> 9) => issue: we deleted the node just created by the new master
> This should not happen often (usually the znode will be deleted soon enough), but it
can happen.

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

View raw message