Return-Path: X-Original-To: apmail-hbase-dev-archive@www.apache.org Delivered-To: apmail-hbase-dev-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id CB54110674 for ; Fri, 10 Jan 2014 22:49:51 +0000 (UTC) Received: (qmail 72965 invoked by uid 500); 10 Jan 2014 22:49:50 -0000 Delivered-To: apmail-hbase-dev-archive@hbase.apache.org Received: (qmail 72858 invoked by uid 500); 10 Jan 2014 22:49:50 -0000 Mailing-List: contact dev-help@hbase.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@hbase.apache.org Delivered-To: mailing list dev@hbase.apache.org Received: (qmail 72696 invoked by uid 99); 10 Jan 2014 22:49:50 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 10 Jan 2014 22:49:50 +0000 Date: Fri, 10 Jan 2014 22:49:50 +0000 (UTC) From: "Andrew Purtell (JIRA)" To: dev@hbase.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Resolved] (HBASE-10310) ZNodeCleaner session expired for /hbase/master MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/HBASE-10310?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Purtell resolved HBASE-10310. ------------------------------------ Resolution: Fixed Fix Version/s: 0.99.0 0.96.2 0.98.0 Hadoop Flags: Reviewed Committed to trunk, 0.98, and 0.96. Thanks for the patch Samir! > ZNodeCleaner session expired for /hbase/master > ---------------------------------------------- > > Key: HBASE-10310 > URL: https://issues.apache.org/jira/browse/HBASE-10310 > Project: HBase > Issue Type: Bug > Components: master > Affects Versions: 0.96.1.1 > Environment: x86_64 GNU/Linux > Reporter: Samir Ahmic > Assignee: Samir Ahmic > Fix For: 0.98.0, 0.96.2, 0.99.0 > > Attachments: HBASE-10310.patch > > > I was testing "hbase master clear" command while working on [HBASE-7386] here is command and exception: > {code} > $ export HBASE_ZNODE_FILE=/tmp/hbase-hadoop-master.znode; ./hbase master clear > 14/01/10 14:05:44 INFO zookeeper.ZooKeeper: Initiating client connection, connectString=zk1:2181 sessionTimeout=90000 watcher=clean znode for master, quorum=zk1:2181, baseZNode=/hbase > 14/01/10 14:05:44 INFO zookeeper.RecoverableZooKeeper: Process identifier=clean znode for master connecting to ZooKeeper ensemble=zk1:2181 > 14/01/10 14:05:44 INFO zookeeper.ClientCnxn: Opening socket connection to server zk1/172.17.33.5:2181. Will not attempt to authenticate using SASL (Unable to locate a login configuration) > 14/01/10 14:05:44 INFO zookeeper.ClientCnxn: Socket connection established to zk11/172.17.33.5:2181, initiating session > 14/01/10 14:05:44 INFO zookeeper.ClientCnxn: Session establishment complete on server zk1/172.17.33.5:2181, sessionid = 0x1427a96bfea4a8a, negotiated timeout = 40000 > 14/01/10 14:05:44 INFO zookeeper.ZooKeeper: Session: 0x1427a96bfea4a8a closed > 14/01/10 14:05:44 INFO zookeeper.ClientCnxn: EventThread shut down > 14/01/10 14:05:44 WARN zookeeper.RecoverableZooKeeper: Possibly transient ZooKeeper, quorum=zk1:2181, exception=org.apache.zookeeper.KeeperException$SessionExpiredException: KeeperErrorCode = Session expired for /hbase/master > 14/01/10 14:05:44 INFO util.RetryCounter: Sleeping 1000ms before retry #0... > 14/01/10 14:05:45 WARN zookeeper.RecoverableZooKeeper: Possibly transient ZooKeeper, quorum=zk1:2181, exception=org.apache.zookeeper.KeeperException$SessionExpiredException: KeeperErrorCode = Session expired for /hbase/master > 14/01/10 14:05:45 ERROR zookeeper.RecoverableZooKeeper: ZooKeeper getData failed after 1 attempts > 14/01/10 14:05:45 WARN zookeeper.ZKUtil: clean znode for master-0x1427a96bfea4a8a, quorum=zk1:2181, baseZNode=/hbase Unable to get data of znode /hbase/master > org.apache.zookeeper.KeeperException$SessionExpiredException: KeeperErrorCode = Session expired for /hbase/master > at org.apache.zookeeper.KeeperException.create(KeeperException.java:127) > at org.apache.zookeeper.KeeperException.create(KeeperException.java:51) > at org.apache.zookeeper.ZooKeeper.getData(ZooKeeper.java:1151) > at org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper.getData(RecoverableZooKeeper.java:337) > at org.apache.hadoop.hbase.zookeeper.ZKUtil.getDataNoWatch(ZKUtil.java:777) > at org.apache.hadoop.hbase.zookeeper.MasterAddressTracker.deleteIfEquals(MasterAddressTracker.java:170) > at org.apache.hadoop.hbase.ZNodeClearer.clear(ZNodeClearer.java:160) > at org.apache.hadoop.hbase.master.HMasterCommandLine.run(HMasterCommandLine.java:138) > at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70) > at org.apache.hadoop.hbase.util.ServerCommandLine.doMain(ServerCommandLine.java:126) > at org.apache.hadoop.hbase.master.HMaster.main(HMaster.java:2779) > 14/01/10 14:05:45 ERROR zookeeper.ZooKeeperWatcher: clean znode for master-0x1427a96bfea4a8a, quorum=zk1:2181, baseZNode=/hbase Received unexpected KeeperException, re-throwing exception > org.apache.zookeeper.KeeperException$SessionExpiredException: KeeperErrorCode = Session expired for /hbase/master > at org.apache.zookeeper.KeeperException.create(KeeperException.java:127) > at org.apache.zookeeper.KeeperException.create(KeeperException.java:51) > at org.apache.zookeeper.ZooKeeper.getData(ZooKeeper.java:1151) > at org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper.getData(RecoverableZooKeeper.java:337) > at org.apache.hadoop.hbase.zookeeper.ZKUtil.getDataNoWatch(ZKUtil.java:777) > at org.apache.hadoop.hbase.zookeeper.MasterAddressTracker.deleteIfEquals(MasterAddressTracker.java:170) > at org.apache.hadoop.hbase.ZNodeClearer.clear(ZNodeClearer.java:160) > at org.apache.hadoop.hbase.master.HMasterCommandLine.run(HMasterCommandLine.java:138) > at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70) > at org.apache.hadoop.hbase.util.ServerCommandLine.doMain(ServerCommandLine.java:126) > at org.apache.hadoop.hbase.master.HMaster.main(HMaster.java:2779) > 14/01/10 14:05:45 WARN zookeeper.ZooKeeperNodeTracker: Can't get or delete the master znode > org.apache.zookeeper.KeeperException$SessionExpiredException: KeeperErrorCode = Session expired for /hbase/master > at org.apache.zookeeper.KeeperException.create(KeeperException.java:127) > at org.apache.zookeeper.KeeperException.create(KeeperException.java:51) > at org.apache.zookeeper.ZooKeeper.getData(ZooKeeper.java:1151) > at org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper.getData(RecoverableZooKeeper.java:337) > at org.apache.hadoop.hbase.zookeeper.ZKUtil.getDataNoWatch(ZKUtil.java:777) > at org.apache.hadoop.hbase.zookeeper.MasterAddressTracker.deleteIfEquals(MasterAddressTracker.java:170) > at org.apache.hadoop.hbase.ZNodeClearer.clear(ZNodeClearer.java:160) > at org.apache.hadoop.hbase.master.HMasterCommandLine.run(HMasterCommandLine.java:138) > at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70) > at org.apache.hadoop.hbase.util.ServerCommandLine.doMain(ServerCommandLine.java:126) > at org.apache.hadoop.hbase.master.HMaster.main(HMaster.java:2779) > {code} > After checking ZNodeCleaner.java i notice this lines : > {code} > try { > znodeFileContent = ZNodeClearer.readMyEphemeralNodeOnDisk(); > > } catch (FileNotFoundException fnfe) { > // If no file, just keep going -- return success. > LOG.warn("Can't find the znode file; presume non-fatal", fnfe); > return true; > } catch (IOException e) { > LOG.warn("Can't read the content of the znode file", e); > return false; > } finally { > zkw.close(); > } > return MasterAddressTracker.deleteIfEquals(zkw, znodeFileContent); > } > {code} > Looks like we are closing zookeeper connection prematurely. After moving > {code} return MasterAddressTracker.deleteIfEquals(zkw, znodeFileContent); {code} inside try block issue was fixed. -- This message was sent by Atlassian JIRA (v6.1.5#6160)