hadoop-common-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Vinayakumar B (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HADOOP-10722) Standby NN continuing as standby when active NN machine got shutdown.
Date Thu, 19 Jun 2014 10:30:24 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-10722?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14037211#comment-14037211
] 

Vinayakumar B commented on HADOOP-10722:
----------------------------------------

Ideally Fencing methods should be configured to not to allow multiple writers to same shared
storage.

QJM supports the fencing feature on its own. i.e. it wont allow multiple writers at a time.
So external fencing methods need not be configured.
You can remove the SSH fencing method from both machines configuration and restart the cluster.
Then failover will happen successfully.

You can just set the below configuration for fence methods to skip SSH fence.
{code:xml}<property>
  <name>dfs.ha.fencing.methods</name>
  <value>shell(/bin/true)</value>
</property>{code}

> Standby NN continuing as standby when active NN machine got shutdown.
> ---------------------------------------------------------------------
>
>                 Key: HADOOP-10722
>                 URL: https://issues.apache.org/jira/browse/HADOOP-10722
>             Project: Hadoop Common
>          Issue Type: Bug
>          Components: auto-failover, ha
>    Affects Versions: 2.4.0
>            Reporter: surendra singh lilhore
>
> I have HA cluster with 3 ZK, 3 QJM.
> My Active NN machine got shutdown, but still my standby NN is standby only.
> It should be active
> ZKFC logs
> ========
> {noformat}
> 2014-06-19 13:39:30,810 INFO org.apache.hadoop.ha.NodeFencer: ====== Beginning Service
Fencing Process... ======
> 2014-06-19 13:39:30,810 INFO org.apache.hadoop.ha.NodeFencer: Trying method 1/1: org.apache.hadoop.ha.SshFenceByTcpPort(null)
> 2014-06-19 13:39:30,811 INFO org.apache.hadoop.ha.SshFenceByTcpPort: Connecting to host-10-18-40-101...
> 2014-06-19 13:39:30,811 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch: Connecting
to host-10-18-40-101 port 22
> 2014-06-19 13:39:33,814 WARN org.apache.hadoop.ha.SshFenceByTcpPort: Unable to connect
to host-10-18-40-101 as user myuser
> com.jcraft.jsch.JSchException: java.net.NoRouteToHostException: No route to host
> 	at com.jcraft.jsch.Util.createSocket(Util.java:386)
> 	at com.jcraft.jsch.Session.connect(Session.java:182)
> 	at org.apache.hadoop.ha.SshFenceByTcpPort.tryFence(SshFenceByTcpPort.java:100)
> 	at org.apache.hadoop.ha.NodeFencer.fence(NodeFencer.java:97)
> 	at org.apache.hadoop.ha.ZKFailoverController.doFence(ZKFailoverController.java:521)
> 	at org.apache.hadoop.ha.ZKFailoverController.fenceOldActive(ZKFailoverController.java:494)
> 	at org.apache.hadoop.ha.ZKFailoverController.access$1100(ZKFailoverController.java:59)
> 	at org.apache.hadoop.ha.ZKFailoverController$ElectorCallbacks.fenceOldActive(ZKFailoverController.java:837)
> 	at org.apache.hadoop.ha.ActiveStandbyElector.fenceOldActive(ActiveStandbyElector.java:901)
> 	at org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(ActiveStandbyElector.java:800)
> 	at org.apache.hadoop.ha.ActiveStandbyElector.processResult(ActiveStandbyElector.java:415)
> 	at org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:596)
> 	at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:495)
> 2014-06-19 13:39:33,814 WARN org.apache.hadoop.ha.NodeFencer: Fencing method org.apache.hadoop.ha.SshFenceByTcpPort(null)
was unsuccessful.
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Mime
View raw message