Return-Path: X-Original-To: apmail-hadoop-hdfs-issues-archive@minotaur.apache.org Delivered-To: apmail-hadoop-hdfs-issues-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 893DB9140 for ; Wed, 30 May 2012 14:26:23 +0000 (UTC) Received: (qmail 85791 invoked by uid 500); 30 May 2012 14:26:23 -0000 Delivered-To: apmail-hadoop-hdfs-issues-archive@hadoop.apache.org Received: (qmail 85751 invoked by uid 500); 30 May 2012 14:26:23 -0000 Mailing-List: contact hdfs-issues-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: hdfs-issues@hadoop.apache.org Delivered-To: mailing list hdfs-issues@hadoop.apache.org Received: (qmail 85741 invoked by uid 99); 30 May 2012 14:26:23 -0000 Received: from issues-vm.apache.org (HELO issues-vm) (140.211.11.160) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 30 May 2012 14:26:23 +0000 Received: from isssues-vm.apache.org (localhost [127.0.0.1]) by issues-vm (Postfix) with ESMTP id 20D25142857 for ; Wed, 30 May 2012 14:26:23 +0000 (UTC) Date: Wed, 30 May 2012 14:26:23 +0000 (UTC) From: "suja s (JIRA)" To: hdfs-issues@hadoop.apache.org Message-ID: <335768304.16361.1338387983137.JavaMail.jiratomcat@issues-vm> In-Reply-To: <448999917.16310.1338387383639.JavaMail.jiratomcat@issues-vm> Subject: [jira] [Updated] (HDFS-3477) FormatZK and ZKFC startup can fail due to zkclient connection establishment delay MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/HDFS-3477?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] suja s updated HDFS-3477: ------------------------- Description: Format and ZKFC startup flows continue further after creation of zkclient connection without waiting to check whether the connection is completely established. This leads to failure at the subsequent point if connection was not complete by then. Exception trace for format {noformat} 12/05/30 19:48:24 INFO zookeeper.ClientCnxn: Socket connection established to HOST-xx-xx-xx-55/xx.xx.xx.55:2182, initiating session 12/05/30 19:48:24 INFO zookeeper.ClientCnxn: Session establishment complete on server HOST-xx-xx-xx-55/xx.xx.xx.55:2182, sessionid = 0x1379da4660c0014, negotiated timeout = 5000 12/05/30 19:48:24 WARN ha.ActiveStandbyElector: Ignoring stale result from old client with sessionId 0x1379da4660c0014 12/05/30 19:48:24 INFO zookeeper.ZooKeeper: Session: 0x1379da4660c0014 closed 12/05/30 19:48:24 INFO zookeeper.ClientCnxn: EventThread shut down Exception in thread "main" java.io.IOException: Couldn't determine existence of znode '/hadoop-ha/hacluster' at org.apache.hadoop.ha.ActiveStandbyElector.parentZNodeExists(ActiveStandbyElector.java:263) at org.apache.hadoop.ha.ZKFailoverController.formatZK(ZKFailoverController.java:257) at org.apache.hadoop.ha.ZKFailoverController.doRun(ZKFailoverController.java:195) at org.apache.hadoop.ha.ZKFailoverController.access$000(ZKFailoverController.java:58) at org.apache.hadoop.ha.ZKFailoverController$1.run(ZKFailoverController.java:163) at org.apache.hadoop.ha.ZKFailoverController$1.run(ZKFailoverController.java:159) at org.apache.hadoop.security.SecurityUtil.doAsLoginUserOrFatal(SecurityUtil.java:438) at org.apache.hadoop.ha.ZKFailoverController.run(ZKFailoverController.java:159) at org.apache.hadoop.hdfs.tools.DFSZKFailoverController.main(DFSZKFailoverController.java:171) Caused by: org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss for /hadoop-ha/hacluster at org.apache.zookeeper.KeeperException.create(KeeperException.java:99) at org.apache.zookeeper.KeeperException.create(KeeperException.java:51) at org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:1021) at org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:1049) at org.apache.hadoop.ha.ActiveStandbyElector.parentZNodeExists(ActiveStandbyElector.java:261) ... 8 more {noformat} was:Format and ZKFC startup flows continue further after creation of zkclient connection without waiting to check whether the connection is completely established. This leads to failure at the subsequent point if connection was not complete by then. > FormatZK and ZKFC startup can fail due to zkclient connection establishment delay > --------------------------------------------------------------------------------- > > Key: HDFS-3477 > URL: https://issues.apache.org/jira/browse/HDFS-3477 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: auto-failover > Affects Versions: 3.0.0 > Reporter: suja s > Assignee: Rakesh R > > Format and ZKFC startup flows continue further after creation of zkclient connection without waiting to check whether the connection is completely established. This leads to failure at the subsequent point if connection was not complete by then. > Exception trace for format > {noformat} > 12/05/30 19:48:24 INFO zookeeper.ClientCnxn: Socket connection established to HOST-xx-xx-xx-55/xx.xx.xx.55:2182, initiating session > 12/05/30 19:48:24 INFO zookeeper.ClientCnxn: Session establishment complete on server HOST-xx-xx-xx-55/xx.xx.xx.55:2182, sessionid = 0x1379da4660c0014, negotiated timeout = 5000 > 12/05/30 19:48:24 WARN ha.ActiveStandbyElector: Ignoring stale result from old client with sessionId 0x1379da4660c0014 > 12/05/30 19:48:24 INFO zookeeper.ZooKeeper: Session: 0x1379da4660c0014 closed > 12/05/30 19:48:24 INFO zookeeper.ClientCnxn: EventThread shut down > Exception in thread "main" java.io.IOException: Couldn't determine existence of znode '/hadoop-ha/hacluster' > at org.apache.hadoop.ha.ActiveStandbyElector.parentZNodeExists(ActiveStandbyElector.java:263) > at org.apache.hadoop.ha.ZKFailoverController.formatZK(ZKFailoverController.java:257) > at org.apache.hadoop.ha.ZKFailoverController.doRun(ZKFailoverController.java:195) > at org.apache.hadoop.ha.ZKFailoverController.access$000(ZKFailoverController.java:58) > at org.apache.hadoop.ha.ZKFailoverController$1.run(ZKFailoverController.java:163) > at org.apache.hadoop.ha.ZKFailoverController$1.run(ZKFailoverController.java:159) > at org.apache.hadoop.security.SecurityUtil.doAsLoginUserOrFatal(SecurityUtil.java:438) > at org.apache.hadoop.ha.ZKFailoverController.run(ZKFailoverController.java:159) > at org.apache.hadoop.hdfs.tools.DFSZKFailoverController.main(DFSZKFailoverController.java:171) > Caused by: org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss for /hadoop-ha/hacluster > at org.apache.zookeeper.KeeperException.create(KeeperException.java:99) > at org.apache.zookeeper.KeeperException.create(KeeperException.java:51) > at org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:1021) > at org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:1049) > at org.apache.hadoop.ha.ActiveStandbyElector.parentZNodeExists(ActiveStandbyElector.java:261) > ... 8 more > {noformat} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira