hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jean-Daniel Cryans (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-5926) Delete the master znode after a master crash
Date Thu, 29 Nov 2012 21:14:59 GMT

    [ https://issues.apache.org/jira/browse/HBASE-5926?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13506801#comment-13506801
] 

Jean-Daniel Cryans commented on HBASE-5926:
-------------------------------------------

This jira has the odd side-effect of printing out a lot of garbage when running in standalone
and killing it with -9, gist of it being:

{noformat}
2012-11-29 13:08:27,227 WARN org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper: Possibly
transient ZooKeeper exception: org.apache.zookeeper.KeeperException$ConnectionLossException:
KeeperErrorCode = ConnectionLoss for /hbase/master
2012-11-29 13:08:27,227 ERROR org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper: ZooKeeper
getData failed after 0 retries
2012-11-29 13:08:27,227 WARN org.apache.hadoop.hbase.zookeeper.ZKUtil: clean znode for master
Unable to get data of znode /hbase/master
org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss
for /hbase/master
        at org.apache.zookeeper.KeeperException.create(KeeperException.java:99)
        at org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
        at org.apache.zookeeper.ZooKeeper.getData(ZooKeeper.java:1131)
        at org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper.getData(RecoverableZooKeeper.java:291)
        at org.apache.hadoop.hbase.zookeeper.ZKUtil.getDataNoWatch(ZKUtil.java:562)
        at org.apache.hadoop.hbase.zookeeper.MasterAddressTracker.deleteIfEquals(MasterAddressTracker.java:168)
        at org.apache.hadoop.hbase.ZNodeClearer.clear(ZNodeClearer.java:150)
        at org.apache.hadoop.hbase.master.HMasterCommandLine.run(HMasterCommandLine.java:110)
        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
        at org.apache.hadoop.hbase.util.ServerCommandLine.doMain(ServerCommandLine.java:78)
        at org.apache.hadoop.hbase.master.HMaster.main(HMaster.java:2298)
{noformat}

Basically the znode cleaner fails hard because ZK is offline.

I was confused to see more logs being printed out after running the kill.
                
> Delete the master znode after a master crash
> --------------------------------------------
>
>                 Key: HBASE-5926
>                 URL: https://issues.apache.org/jira/browse/HBASE-5926
>             Project: HBase
>          Issue Type: Improvement
>          Components: master, scripts
>    Affects Versions: 0.96.0
>            Reporter: nkeywal
>            Assignee: nkeywal
>            Priority: Minor
>             Fix For: 0.96.0
>
>         Attachments: 5926.v10.patch, 5926.v11.patch, 5926.v13.patch, 5926.v14.patch,
5926.v6.patch, 5926.v8.patch, 5926.v9.patch
>
>
> This is the continuation of the work done in HBASE-5844.
> But we can't apply exactly the same strategy: for the region server, there is a znode
per region server, while for the master & backup master there is a single znode for both.
> So if we apply the same strategy as for a regionserver, we may have this scenario:
> 1) Master starts
> 2) Backup master starts
> 3) Master dies
> 4) ZK detects it
> 5) Backup master receives the update from ZK
> 6) Backup master creates the new master node and become the main master
> 7) Previous master script continues
> 8) Previous master script deletes the master node in ZK
> 9) => issue: we deleted the node just created by the new master
> This should not happen often (usually the znode will be deleted soon enough), but it
can happen.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message