hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Pavan Kumar Polineni <smartsunny...@gmail.com>
Subject Re: auto-failover does not work
Date Mon, 02 Dec 2013 12:13:10 GMT
Post your config files and in which method you are following for automatic
failover


On Mon, Dec 2, 2013 at 5:34 PM, YouPeng Yang <yypvsxf19870706@gmail.com>wrote:

> Hi i
>   I'm testing the HA auto-failover within hadoop-2.2.0
>
>   The cluster can be manully failover ,however failed with the automatic
> failover.
> I setup the HA according to  the URL
>
> http://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/HDFSHighAvailabilityWithQJM.html
>
>   When I test the automatic failover, I killed my active NN by kill -9
> <Pid-nn>,while the standby namenode does not change to active state.
>   It came out the log in my DFSZKFailoverController as [1]
>
>  Please help me ,any suggestion will be appreciated.
>
>
> Regards.
>
>
> zkfc
> log[1]----------------------------------------------------------------------------------------------------
>
> 2013-12-02 19:49:28,588 INFO org.apache.hadoop.ha.NodeFencer: ======
> Beginning Service Fencing Process... ======
> 2013-12-02 19:49:28,588 INFO org.apache.hadoop.ha.NodeFencer: Trying
> method 1/1: org.apache.hadoop.ha.SshFenceByTcpPort(null)
> 2013-12-02 19:49:28,590 INFO org.apache.hadoop.ha.SshFenceByTcpPort:
> Connecting to hadoop3...
> 2013-12-02 19:49:28,590 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch:
> Connecting to hadoop3 port 22
> 2013-12-02 19:49:28,592 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch:
> Connection established
> 2013-12-02 19:49:28,603 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch:
> Remote version string: SSH-2.0-OpenSSH_5.3
> 2013-12-02 19:49:28,603 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch:
> Local version string: SSH-2.0-JSCH-0.1.42
> 2013-12-02 19:49:28,603 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch:
> CheckCiphers:
> aes256-ctr,aes192-ctr,aes128-ctr,aes256-cbc,aes192-cbc,aes128-cbc,3des-ctr,arcfour,arcfour128,arcfour256
> 2013-12-02 19:49:28,608 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch:
> aes256-ctr is not available.
> 2013-12-02 19:49:28,608 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch:
> aes192-ctr is not available.
> 2013-12-02 19:49:28,608 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch:
> aes256-cbc is not available.
> 2013-12-02 19:49:28,608 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch:
> aes192-cbc is not available.
> 2013-12-02 19:49:28,609 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch:
> arcfour256 is not available.
> 2013-12-02 19:49:28,610 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch:
> SSH_MSG_KEXINIT sent
> 2013-12-02 19:49:28,610 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch:
> SSH_MSG_KEXINIT received
> 2013-12-02 19:49:28,610 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch:
> kex: server->client aes128-ctr hmac-md5 none
> 2013-12-02 19:49:28,610 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch:
> kex: client->server aes128-ctr hmac-md5 none
> 2013-12-02 19:49:28,617 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch:
> SSH_MSG_KEXDH_INIT sent
> 2013-12-02 19:49:28,617 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch:
> expecting SSH_MSG_KEXDH_REPLY
> 2013-12-02 19:49:28,634 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch:
> ssh_rsa_verify: signature true
> 2013-12-02 19:49:28,635 WARN org.apache.hadoop.ha.SshFenceByTcpPort.jsch:
> Permanently added 'hadoop3' (RSA) to the list of known hosts.
> 2013-12-02 19:49:28,635 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch:
> SSH_MSG_NEWKEYS sent
> 2013-12-02 19:49:28,635 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch:
> SSH_MSG_NEWKEYS received
> 2013-12-02 19:49:28,636 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch:
> SSH_MSG_SERVICE_REQUEST sent
> 2013-12-02 19:49:28,637 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch:
> SSH_MSG_SERVICE_ACCEPT received
> 2013-12-02 19:49:28,638 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch:
> Authentications that can continue:
> gssapi-with-mic,publickey,keyboard-interactive,password
> 2013-12-02 19:49:28,639 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch:
> Next authentication method: gssapi-with-mic
> 2013-12-02 19:49:28,642 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch:
> Authentications that can continue: publickey,keyboard-interactive,password
> 2013-12-02 19:49:28,642 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch:
> Next authentication method: publickey
> 2013-12-02 19:49:28,644 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch:
> Disconnecting from hadoop3 port 22
> 2013-12-02 19:49:28,644 WARN org.apache.hadoop.ha.SshFenceByTcpPort:
> Unable to connect to hadoop3 as user hadoop
> com.jcraft.jsch.JSchException: Auth fail
>     at com.jcraft.jsch.Session.connect(Session.java:452)
>     at
> org.apache.hadoop.ha.SshFenceByTcpPort.tryFence(SshFenceByTcpPort.java:100)
>     at org.apache.hadoop.ha.NodeFencer.fence(NodeFencer.java:97)
>     at
> org.apache.hadoop.ha.ZKFailoverController.doFence(ZKFailoverController.java:521)
>     at
> org.apache.hadoop.ha.ZKFailoverController.fenceOldActive(ZKFailoverController.java:494)
>     at
> org.apache.hadoop.ha.ZKFailoverController.access$1100(ZKFailoverController.java:59)
>     at
> org.apache.hadoop.ha.ZKFailoverController$ElectorCallbacks.fenceOldActive(ZKFailoverController.java:837)
>     at
> org.apache.hadoop.ha.ActiveStandbyElector.fenceOldActive(ActiveStandbyElector.java:900)
>     at
> org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(ActiveStandbyElector.java:799)
>     at
> org.apache.hadoop.ha.ActiveStandbyElector.processResult(ActiveStandbyElector.java:415)
>     at
> org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:596)
>     at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:495)
> 2013-12-02 19:49:28,644 WARN org.apache.hadoop.ha.NodeFencer: Fencing
> method org.apache.hadoop.ha.SshFenceByTcpPort(null) was unsuccessful.
> 2013-12-02 19:49:28,645 ERROR org.apache.hadoop.ha.NodeFencer: Unable to
> fence service by any configured method.
> 2013-12-02 19:49:28,645 INFO org.apache.hadoop.ha.ZKFailoverController:
> Local service NameNode at hadoop2/10.7.23.125:8020 entered state:
> SERVICE_NOT_RESPONDING
> 2013-12-02 19:49:28,646 WARN org.apache.hadoop.ha.ActiveStandbyElector:
> Exception handling the winning of election
> java.lang.RuntimeException: Unable to fence NameNode at hadoop3/
> 10.7.23.124:8020
>     at
> org.apache.hadoop.ha.ZKFailoverController.doFence(ZKFailoverController.java:522)
>     at
> org.apache.hadoop.ha.ZKFailoverController.fenceOldActive(ZKFailoverController.java:494)
>     at
> org.apache.hadoop.ha.ZKFailoverController.access$1100(ZKFailoverController.java:59)
>     at
> org.apache.hadoop.ha.ZKFailoverController$ElectorCallbacks.fenceOldActive(ZKFailoverController.java:837)
>     at
> org.apache.hadoop.ha.ActiveStandbyElector.fenceOldActive(ActiveStandbyElector.java:900)
>     at
> org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(ActiveStandbyElector.java:799)
>     at
> org.apache.hadoop.ha.ActiveStandbyElector.processResult(ActiveStandbyElector.java:415)
>     at
> org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:596)
>     at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:495)
> 2013-12-02 19:49:28,646 INFO org.apache.hadoop.ha.ActiveStandbyElector:
> Trying to re-establish ZK session
> 2013-12-02 19:49:28,669 INFO org.apache.zookeeper.ZooKeeper: Session:
> 0x2429313c808025b closed
> 2013-12-02 19:49:29,672 INFO org.apache.zookeeper.ZooKeeper: Initiating
> client connection, connectString=hadoop1:2181,hadoop2:2181,hadoop3:2181
> sessionTimeout=5000
> watcher=org.apache.hadoop.ha.ActiveStandbyElector$WatcherWithClientRef@3545fe3b
> 2013-12-02 19:49:29,675 INFO org.apache.zookeeper.ClientCnxn: Opening
> socket connection to server hadoop3/10.7.23.124:2181. Will not attempt to
> authenticate using SASL (Unable to locate a login configuration)
> 2013-12-02 19:49:29,675 INFO org.apache.zookeeper.ClientCnxn: Socket
> connection established to hadoop3/10.7.23.124:2181, initiating session
> 2013-12-02 19:49:29,699 INFO org.apache.zookeeper.ClientCnxn: Session
> establishment complete on server hadoop3/10.7.23.124:2181, sessionid =
> 0x3429312ba330262, negotiated timeout = 5000
> 2013-12-02 19:49:29,702 INFO org.apache.zookeeper.ClientCnxn: EventThread
> shut down
> 2013-12-02 19:49:29,706 INFO org.apache.hadoop.ha.ActiveStandbyElector:
> Session connected.
> 2013-12-02 19:49:29,706 INFO org.apache.hadoop.ha.ZKFailoverController:
> Quitting master election for NameNode at hadoop2/10.7.23.125:8020 and
> marking that fencing is necessary
> 2013-12-02 19:49:29,706 INFO org.apache.hadoop.ha.ActiveStandbyElector:
> Yielding from election
> 2013-12-02 19:49:29,727 INFO org.apache.zookeeper.ZooKeeper: Session:
> 0x3429312ba330262 closed
> 2013-12-02 19:49:29,728 WARN org.apache.hadoop.ha.ActiveStandbyElector:
> Ignoring stale result from old client with sessionId 0x3429312ba330262
> 2013-12-02 19:49:29,728 INFO org.apache.zookeeper.ClientCnxn: EventThread
> shut down
>

Mime
View raw message