hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Todd Lipcon (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-3477) FormatZK and ZKFC startup can fail due to zkclient connection establishment delay
Date Sat, 09 Jun 2012 00:06:23 GMT

    [ https://issues.apache.org/jira/browse/HDFS-3477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13292109#comment-13292109
] 

Todd Lipcon commented on HDFS-3477:
-----------------------------------

Hey Rakesh. Looking good. One more small comment which I missed:

- We currently have the following code lower down inside {{doRun()}}:
{code}
      if (ioe.getCause() instanceof KeeperException.ConnectionLossException) {
        LOG.fatal("Unable to start failover controller. Unable to connect " +
            "to ZooKeeper quorum at " + zkQuorum + ". Please check the " +
            "configured value for " + ZK_QUORUM_KEY + " and ensure that " +
            "ZooKeeper is running.");
        return ERR_CODE_NO_ZK;
      } else {
        throw ioe;
      }
{code}

but with your bug fix, we'll never get here since {{initZK()}} would have failed already.
So, can you move this fatal error message up to where you catch {{KeeperException}} around
{{initZK()}}? This way we'll give a nice message for users who have a mistake in their configuration.
Thanks.
                
> FormatZK and ZKFC startup can fail due to zkclient connection establishment delay
> ---------------------------------------------------------------------------------
>
>                 Key: HDFS-3477
>                 URL: https://issues.apache.org/jira/browse/HDFS-3477
>             Project: Hadoop HDFS
>          Issue Type: Sub-task
>          Components: auto-failover
>    Affects Versions: 2.0.1-alpha
>            Reporter: suja s
>            Assignee: Rakesh R
>         Attachments: HDFS-3477.1.patch, HDFS-3477.2.patch, HDFS-3477.patch
>
>
> Format and ZKFC startup flows continue further after creation of zkclient connection
without waiting to check whether the connection is completely established. This  leads to
failure at the subsequent point if connection was not complete by then.
> Exception trace for format 
> {noformat}
> 12/05/30 19:48:24 INFO zookeeper.ClientCnxn: Socket connection established to HOST-xx-xx-xx-55/xx.xx.xx.55:2182,
initiating session
> 12/05/30 19:48:24 INFO zookeeper.ClientCnxn: Session establishment complete on server
HOST-xx-xx-xx-55/xx.xx.xx.55:2182, sessionid = 0x1379da4660c0014, negotiated timeout = 5000
> 12/05/30 19:48:24 WARN ha.ActiveStandbyElector: Ignoring stale result from old client
with sessionId 0x1379da4660c0014
> 12/05/30 19:48:24 INFO zookeeper.ZooKeeper: Session: 0x1379da4660c0014 closed
> 12/05/30 19:48:24 INFO zookeeper.ClientCnxn: EventThread shut down
> Exception in thread "main" java.io.IOException: Couldn't determine existence of znode
'/hadoop-ha/hacluster'
>         at org.apache.hadoop.ha.ActiveStandbyElector.parentZNodeExists(ActiveStandbyElector.java:263)
>         at org.apache.hadoop.ha.ZKFailoverController.formatZK(ZKFailoverController.java:257)
>         at org.apache.hadoop.ha.ZKFailoverController.doRun(ZKFailoverController.java:195)
>         at org.apache.hadoop.ha.ZKFailoverController.access$000(ZKFailoverController.java:58)
>         at org.apache.hadoop.ha.ZKFailoverController$1.run(ZKFailoverController.java:163)
>         at org.apache.hadoop.ha.ZKFailoverController$1.run(ZKFailoverController.java:159)
>         at org.apache.hadoop.security.SecurityUtil.doAsLoginUserOrFatal(SecurityUtil.java:438)
>         at org.apache.hadoop.ha.ZKFailoverController.run(ZKFailoverController.java:159)
>         at org.apache.hadoop.hdfs.tools.DFSZKFailoverController.main(DFSZKFailoverController.java:171)
> Caused by: org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode
= ConnectionLoss for /hadoop-ha/hacluster
>         at org.apache.zookeeper.KeeperException.create(KeeperException.java:99)
>         at org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
>         at org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:1021)
>         at org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:1049)
>         at org.apache.hadoop.ha.ActiveStandbyElector.parentZNodeExists(ActiveStandbyElector.java:261)
>         ... 8 more
> {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Mime
View raw message