hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Azuryy Yu <azury...@gmail.com>
Subject Re: Can not auto-failover when unplug network interface
Date Tue, 03 Dec 2013 02:26:54 GMT
This is still because your fence method configuraed improperly.
plseae paste your fence configuration. and double check you can ssh on
active NN to standby NN without password.


On Tue, Dec 3, 2013 at 10:23 AM, YouPeng Yang <yypvsxf19870706@gmail.com>wrote:

> Hi
>    Another auto-failover testing problem:
>
>    My HA can auto-failover after I kill the active NN.When it comes to the
> unplug  network interface to simulate the hardware fail,the auto-failover
> seems  not to work after   wait for times -the zkfc logs as [1].
>
>    I'm using the default sshfence.
>
>
>
>
>
>
> [1] zkfc
> logs----------------------------------------------------------------------------------------
> 2013-12-03 10:05:56,650 INFO org.apache.hadoop.ha.NodeFencer: ======
> Beginning Service Fencing Process... ======
> 2013-12-03 10:05:56,650 INFO org.apache.hadoop.ha.NodeFencer: Trying
> method 1/1: org.apache.hadoop.ha.SshFenceByTcpPort(null)
> 2013-12-03 10:05:56,651 INFO org.apache.hadoop.ha.SshFenceByTcpPort:
> Connecting to hadoop3...
> 2013-12-03 10:05:56,651 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch:
> Connecting to hadoop3 port 22
> 2013-12-03 10:05:59,648 WARN org.apache.hadoop.ha.SshFenceByTcpPort:
> Unable to connect to hadoop3 as user hadoop
> com.jcraft.jsch.JSchException: java.net.NoRouteToHostException: No route
> to host
>     at com.jcraft.jsch.Util.createSocket(Util.java:386)
>     at com.jcraft.jsch.Session.connect(Session.java:182)
>     at
> org.apache.hadoop.ha.SshFenceByTcpPort.tryFence(SshFenceByTcpPort.java:100)
>     at org.apache.hadoop.ha.NodeFencer.fence(NodeFencer.java:97)
>     at
> org.apache.hadoop.ha.ZKFailoverController.doFence(ZKFailoverController.java:521)
>     at
> org.apache.hadoop.ha.ZKFailoverController.fenceOldActive(ZKFailoverController.java:494)
>     at
> org.apache.hadoop.ha.ZKFailoverController.access$1100(ZKFailoverController.java:59)
>     at
> org.apache.hadoop.ha.ZKFailoverController$ElectorCallbacks.fenceOldActive(ZKFailoverController.java:837)
>     at
> org.apache.hadoop.ha.ActiveStandbyElector.fenceOldActive(ActiveStandbyElector.java:900)
>     at
> org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(ActiveStandbyElector.java:799)
>     at
> org.apache.hadoop.ha.ActiveStandbyElector.processResult(ActiveStandbyElector.java:415)
>     at
> org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:596)
>     at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:495)
> 2013-12-03 10:05:59,649 WARN org.apache.hadoop.ha.NodeFencer: Fencing
> method org.apache.hadoop.ha.SshFenceByTcpPort(null) was unsuccessful.
> 2013-12-03 10:05:59,649 ERROR org.apache.hadoop.ha.NodeFencer: Unable to
> fence service by any configured method.
> 2013-12-03 10:05:59,650 WARN org.apache.hadoop.ha.ActiveStandbyElector:
> Exception handling the winning of election
> java.lang.RuntimeException: Unable to fence NameNode at hadoop3/
> 10.7.23.124:8020
>     at
> org.apache.hadoop.ha.ZKFailoverController.doFence(ZKFailoverController.java:522)
>     at
> org.apache.hadoop.ha.ZKFailoverController.fenceOldActive(ZKFailoverController.java:494)
>     at
> org.apache.hadoop.ha.ZKFailoverController.access$1100(ZKFailoverController.java:59)
>     at
> org.apache.hadoop.ha.ZKFailoverController$ElectorCallbacks.fenceOldActive(ZKFailoverController.java:837)
>     at
> org.apache.hadoop.ha.ActiveStandbyElector.fenceOldActive(ActiveStandbyElector.java:900)
>     at
> org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(ActiveStandbyElector.java:799)
>     at
> org.apache.hadoop.ha.ActiveStandbyElector.processResult(ActiveStandbyElector.java:415)
>     at
> org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:596)
>     at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:495)
> 2013-12-03 10:05:59,650 INFO org.apache.hadoop.ha.ActiveStandbyElector:
> Trying to re-establish ZK session
> 2013-12-03 10:05:59,676 INFO org.apache.zookeeper.ZooKeeper: Session:
> 0x142931031810260 closed
> 2013-12-03 10:06:00,678 INFO org.apache.zookeeper.ZooKeeper: Initiating
> client connection, connectString=hadoop1:2181,hadoop2:2181,hadoop3:2181
> sessionTimeout=5000
> watcher=org.apache.hadoop.ha.ActiveStandbyElector$WatcherWithClientRef@5ce2acea
> 2013-12-03 10:06:00,681 INFO org.apache.zookeeper.ClientCnxn: Opening
> socket connection to server hadoop1/10.7.23.122:2181. Will not attempt to
> authenticate using SASL (Unable to locate a login configuration)
> 2013-12-03 10:06:00,681 INFO org.apache.zookeeper.ClientCnxn: Socket
> connection established to hadoop1/10.7.23.122:2181, initiating session
> 2013-12-03 10:06:00,709 INFO org.apache.zookeeper.ClientCnxn: Session
> establishment complete on server hadoop1/10.7.23.122:2181, sessionid =
> 0x142931031810261, negotiated timeout = 5000
> 2013-12-03 10:06:00,711 INFO org.apache.zookeeper.ClientCnxn: EventThread
> shut down
>

Mime
View raw message