Return-Path: X-Original-To: apmail-hadoop-common-issues-archive@minotaur.apache.org Delivered-To: apmail-hadoop-common-issues-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 4F9B711F4D for ; Thu, 19 Jun 2014 10:30:25 +0000 (UTC) Received: (qmail 1699 invoked by uid 500); 19 Jun 2014 10:30:25 -0000 Delivered-To: apmail-hadoop-common-issues-archive@hadoop.apache.org Received: (qmail 1654 invoked by uid 500); 19 Jun 2014 10:30:25 -0000 Mailing-List: contact common-issues-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: common-issues@hadoop.apache.org Delivered-To: mailing list common-issues@hadoop.apache.org Received: (qmail 1635 invoked by uid 99); 19 Jun 2014 10:30:25 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 19 Jun 2014 10:30:25 +0000 Date: Thu, 19 Jun 2014 10:30:24 +0000 (UTC) From: "Vinayakumar B (JIRA)" To: common-issues@hadoop.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (HADOOP-10722) Standby NN continuing as standby when active NN machine got shutdown. MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/HADOOP-10722?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14037211#comment-14037211 ] Vinayakumar B commented on HADOOP-10722: ---------------------------------------- Ideally Fencing methods should be configured to not to allow multiple writers to same shared storage. QJM supports the fencing feature on its own. i.e. it wont allow multiple writers at a time. So external fencing methods need not be configured. You can remove the SSH fencing method from both machines configuration and restart the cluster. Then failover will happen successfully. You can just set the below configuration for fence methods to skip SSH fence. {code:xml} dfs.ha.fencing.methods shell(/bin/true) {code} > Standby NN continuing as standby when active NN machine got shutdown. > --------------------------------------------------------------------- > > Key: HADOOP-10722 > URL: https://issues.apache.org/jira/browse/HADOOP-10722 > Project: Hadoop Common > Issue Type: Bug > Components: auto-failover, ha > Affects Versions: 2.4.0 > Reporter: surendra singh lilhore > > I have HA cluster with 3 ZK, 3 QJM. > My Active NN machine got shutdown, but still my standby NN is standby only. > It should be active > ZKFC logs > ======== > {noformat} > 2014-06-19 13:39:30,810 INFO org.apache.hadoop.ha.NodeFencer: ====== Beginning Service Fencing Process... ====== > 2014-06-19 13:39:30,810 INFO org.apache.hadoop.ha.NodeFencer: Trying method 1/1: org.apache.hadoop.ha.SshFenceByTcpPort(null) > 2014-06-19 13:39:30,811 INFO org.apache.hadoop.ha.SshFenceByTcpPort: Connecting to host-10-18-40-101... > 2014-06-19 13:39:30,811 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch: Connecting to host-10-18-40-101 port 22 > 2014-06-19 13:39:33,814 WARN org.apache.hadoop.ha.SshFenceByTcpPort: Unable to connect to host-10-18-40-101 as user myuser > com.jcraft.jsch.JSchException: java.net.NoRouteToHostException: No route to host > at com.jcraft.jsch.Util.createSocket(Util.java:386) > at com.jcraft.jsch.Session.connect(Session.java:182) > at org.apache.hadoop.ha.SshFenceByTcpPort.tryFence(SshFenceByTcpPort.java:100) > at org.apache.hadoop.ha.NodeFencer.fence(NodeFencer.java:97) > at org.apache.hadoop.ha.ZKFailoverController.doFence(ZKFailoverController.java:521) > at org.apache.hadoop.ha.ZKFailoverController.fenceOldActive(ZKFailoverController.java:494) > at org.apache.hadoop.ha.ZKFailoverController.access$1100(ZKFailoverController.java:59) > at org.apache.hadoop.ha.ZKFailoverController$ElectorCallbacks.fenceOldActive(ZKFailoverController.java:837) > at org.apache.hadoop.ha.ActiveStandbyElector.fenceOldActive(ActiveStandbyElector.java:901) > at org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(ActiveStandbyElector.java:800) > at org.apache.hadoop.ha.ActiveStandbyElector.processResult(ActiveStandbyElector.java:415) > at org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:596) > at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:495) > 2014-06-19 13:39:33,814 WARN org.apache.hadoop.ha.NodeFencer: Fencing method org.apache.hadoop.ha.SshFenceByTcpPort(null) was unsuccessful. > {noformat} -- This message was sent by Atlassian JIRA (v6.2#6252)