Return-Path: X-Original-To: apmail-hadoop-hdfs-user-archive@minotaur.apache.org Delivered-To: apmail-hadoop-hdfs-user-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 2E57610EC9 for ; Tue, 3 Dec 2013 02:27:26 +0000 (UTC) Received: (qmail 84648 invoked by uid 500); 3 Dec 2013 02:27:21 -0000 Delivered-To: apmail-hadoop-hdfs-user-archive@hadoop.apache.org Received: (qmail 84425 invoked by uid 500); 3 Dec 2013 02:27:21 -0000 Mailing-List: contact user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hadoop.apache.org Delivered-To: mailing list user@hadoop.apache.org Received: (qmail 84418 invoked by uid 99); 3 Dec 2013 02:27:20 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 03 Dec 2013 02:27:20 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=HTML_MESSAGE,NORMAL_HTTP_TO_IP,RCVD_IN_DNSWL_LOW,SPF_PASS,WEIRD_PORT X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of azuryyyu@gmail.com designates 209.85.223.178 as permitted sender) Received: from [209.85.223.178] (HELO mail-ie0-f178.google.com) (209.85.223.178) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 03 Dec 2013 02:27:15 +0000 Received: by mail-ie0-f178.google.com with SMTP id lx4so22514255iec.37 for ; Mon, 02 Dec 2013 18:26:55 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; bh=3ZFdm5mOeqXSjthRbhWwass4bQYX5FWsjFxOWqQn3Tg=; b=bxCUx7UW13sAsB5fVyjraPQ9DoDjseEjXEk0sayRJ29t4v4tB4v09IyofVoNBGPBsp xzpPM4vx/EDJq4LZJDrl+lFLhvpQlFwSTsGVn1bQ12NC81cyDmsg3wAAclQ2ryjD0+26 Cu5CY7UunhNnrcz0daXn7M2mdXi7j+VfA2I5O/9Qe0OUEN5Gbprlx+wSJFGy0L7Y8icB D6Rb1gfee55zxQtP+NhO1oIfBw+RtP5xn4ElkE/IzkUeZ3QcqHupmJIKqQuI82uSssil wx49m/Zb+XunNVeFyJRhFBzzhEoZujuThVtzHA5/UkKnU/yFf5VqiGkkp6JfX/SIfcRc nhog== MIME-Version: 1.0 X-Received: by 10.42.61.147 with SMTP id u19mr30090806ich.36.1386037614942; Mon, 02 Dec 2013 18:26:54 -0800 (PST) Received: by 10.64.251.8 with HTTP; Mon, 2 Dec 2013 18:26:54 -0800 (PST) In-Reply-To: References: Date: Tue, 3 Dec 2013 10:26:54 +0800 Message-ID: Subject: Re: Can not auto-failover when unplug network interface From: Azuryy Yu To: user@hadoop.apache.org Content-Type: multipart/alternative; boundary=20cf302234d5af1f4604ec980787 X-Virus-Checked: Checked by ClamAV on apache.org --20cf302234d5af1f4604ec980787 Content-Type: text/plain; charset=ISO-8859-1 This is still because your fence method configuraed improperly. plseae paste your fence configuration. and double check you can ssh on active NN to standby NN without password. On Tue, Dec 3, 2013 at 10:23 AM, YouPeng Yang wrote: > Hi > Another auto-failover testing problem: > > My HA can auto-failover after I kill the active NN.When it comes to the > unplug network interface to simulate the hardware fail,the auto-failover > seems not to work after wait for times -the zkfc logs as [1]. > > I'm using the default sshfence. > > > > > > > [1] zkfc > logs---------------------------------------------------------------------------------------- > 2013-12-03 10:05:56,650 INFO org.apache.hadoop.ha.NodeFencer: ====== > Beginning Service Fencing Process... ====== > 2013-12-03 10:05:56,650 INFO org.apache.hadoop.ha.NodeFencer: Trying > method 1/1: org.apache.hadoop.ha.SshFenceByTcpPort(null) > 2013-12-03 10:05:56,651 INFO org.apache.hadoop.ha.SshFenceByTcpPort: > Connecting to hadoop3... > 2013-12-03 10:05:56,651 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch: > Connecting to hadoop3 port 22 > 2013-12-03 10:05:59,648 WARN org.apache.hadoop.ha.SshFenceByTcpPort: > Unable to connect to hadoop3 as user hadoop > com.jcraft.jsch.JSchException: java.net.NoRouteToHostException: No route > to host > at com.jcraft.jsch.Util.createSocket(Util.java:386) > at com.jcraft.jsch.Session.connect(Session.java:182) > at > org.apache.hadoop.ha.SshFenceByTcpPort.tryFence(SshFenceByTcpPort.java:100) > at org.apache.hadoop.ha.NodeFencer.fence(NodeFencer.java:97) > at > org.apache.hadoop.ha.ZKFailoverController.doFence(ZKFailoverController.java:521) > at > org.apache.hadoop.ha.ZKFailoverController.fenceOldActive(ZKFailoverController.java:494) > at > org.apache.hadoop.ha.ZKFailoverController.access$1100(ZKFailoverController.java:59) > at > org.apache.hadoop.ha.ZKFailoverController$ElectorCallbacks.fenceOldActive(ZKFailoverController.java:837) > at > org.apache.hadoop.ha.ActiveStandbyElector.fenceOldActive(ActiveStandbyElector.java:900) > at > org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(ActiveStandbyElector.java:799) > at > org.apache.hadoop.ha.ActiveStandbyElector.processResult(ActiveStandbyElector.java:415) > at > org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:596) > at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:495) > 2013-12-03 10:05:59,649 WARN org.apache.hadoop.ha.NodeFencer: Fencing > method org.apache.hadoop.ha.SshFenceByTcpPort(null) was unsuccessful. > 2013-12-03 10:05:59,649 ERROR org.apache.hadoop.ha.NodeFencer: Unable to > fence service by any configured method. > 2013-12-03 10:05:59,650 WARN org.apache.hadoop.ha.ActiveStandbyElector: > Exception handling the winning of election > java.lang.RuntimeException: Unable to fence NameNode at hadoop3/ > 10.7.23.124:8020 > at > org.apache.hadoop.ha.ZKFailoverController.doFence(ZKFailoverController.java:522) > at > org.apache.hadoop.ha.ZKFailoverController.fenceOldActive(ZKFailoverController.java:494) > at > org.apache.hadoop.ha.ZKFailoverController.access$1100(ZKFailoverController.java:59) > at > org.apache.hadoop.ha.ZKFailoverController$ElectorCallbacks.fenceOldActive(ZKFailoverController.java:837) > at > org.apache.hadoop.ha.ActiveStandbyElector.fenceOldActive(ActiveStandbyElector.java:900) > at > org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(ActiveStandbyElector.java:799) > at > org.apache.hadoop.ha.ActiveStandbyElector.processResult(ActiveStandbyElector.java:415) > at > org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:596) > at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:495) > 2013-12-03 10:05:59,650 INFO org.apache.hadoop.ha.ActiveStandbyElector: > Trying to re-establish ZK session > 2013-12-03 10:05:59,676 INFO org.apache.zookeeper.ZooKeeper: Session: > 0x142931031810260 closed > 2013-12-03 10:06:00,678 INFO org.apache.zookeeper.ZooKeeper: Initiating > client connection, connectString=hadoop1:2181,hadoop2:2181,hadoop3:2181 > sessionTimeout=5000 > watcher=org.apache.hadoop.ha.ActiveStandbyElector$WatcherWithClientRef@5ce2acea > 2013-12-03 10:06:00,681 INFO org.apache.zookeeper.ClientCnxn: Opening > socket connection to server hadoop1/10.7.23.122:2181. Will not attempt to > authenticate using SASL (Unable to locate a login configuration) > 2013-12-03 10:06:00,681 INFO org.apache.zookeeper.ClientCnxn: Socket > connection established to hadoop1/10.7.23.122:2181, initiating session > 2013-12-03 10:06:00,709 INFO org.apache.zookeeper.ClientCnxn: Session > establishment complete on server hadoop1/10.7.23.122:2181, sessionid = > 0x142931031810261, negotiated timeout = 5000 > 2013-12-03 10:06:00,711 INFO org.apache.zookeeper.ClientCnxn: EventThread > shut down > --20cf302234d5af1f4604ec980787 Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable
This is still because your fence method configuraed improp= erly.=A0
plseae paste your fence configuration. and double check you ca= n ssh on active NN to standby NN without password.


On Tue, Dec 3, 2013 at 10:23 AM, YouPeng= Yang <yypvsxf19870706@gmail.com> wrote:
Hi
=A0=A0 Another auto-failover te= sting problem:
=A0
=A0=A0 My HA can auto-failover after I kill= the active NN.When it comes to the unplug=A0 network interface to simulate= the hardware fail,the auto-failover seems=A0 not to work after=A0=A0 wait = for times -the zkfc logs as [1].

=A0=A0 I'm using the default sshfence.
=A0=A0
=A0




[1] zkfc logs-------------------------------= ---------------------------------------------------------
2013-12-03 10:= 05:56,650 INFO org.apache.hadoop.ha.NodeFencer: =3D=3D=3D=3D=3D=3D Beginnin= g Service Fencing Process... =3D=3D=3D=3D=3D=3D
2013-12-03 10:05:56,650 INFO org.apache.hadoop.ha.NodeFencer: Trying method= 1/1: org.apache.hadoop.ha.SshFenceByTcpPort(null)
2013-12-03 10:05:56,6= 51 INFO org.apache.hadoop.ha.SshFenceByTcpPort: Connecting to hadoop3... 2013-12-03 10:05:56,651 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch: C= onnecting to hadoop3 port 22
2013-12-03 10:05:59,648 WARN org.apache.had= oop.ha.SshFenceByTcpPort: Unable to connect to hadoop3 as user hadoop
com.jcraft.jsch.JSchException: java.net.NoRouteToHostException: No route to= host
=A0=A0=A0 at com.jcraft.jsch.Util.createSocket(Util.java:386)
= =A0=A0=A0 at com.jcraft.jsch.Session.connect(Session.java:182)
=A0=A0=A0= at org.apache.hadoop.ha.SshFenceByTcpPort.tryFence(SshFenceByTcpPort.java:= 100)
=A0=A0=A0 at org.apache.hadoop.ha.NodeFencer.fence(NodeFencer.java:97)
= =A0=A0=A0 at org.apache.hadoop.ha.ZKFailoverController.doFence(ZKFailoverCo= ntroller.java:521)
=A0=A0=A0 at org.apache.hadoop.ha.ZKFailoverControlle= r.fenceOldActive(ZKFailoverController.java:494)
=A0=A0=A0 at org.apache.hadoop.ha.ZKFailoverController.access$1100(ZKFailov= erController.java:59)
=A0=A0=A0 at org.apache.hadoop.ha.ZKFailoverContro= ller$ElectorCallbacks.fenceOldActive(ZKFailoverController.java:837)
=A0= =A0=A0 at org.apache.hadoop.ha.ActiveStandbyElector.fenceOldActive(ActiveSt= andbyElector.java:900)
=A0=A0=A0 at org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(ActiveS= tandbyElector.java:799)
=A0=A0=A0 at org.apache.hadoop.ha.ActiveStandbyE= lector.processResult(ActiveStandbyElector.java:415)
=A0=A0=A0 at org.apa= che.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:596)
=A0=A0=A0 at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.jav= a:495)
2013-12-03 10:05:59,649 WARN org.apache.hadoop.ha.NodeFencer: Fen= cing method org.apache.hadoop.ha.SshFenceByTcpPort(null) was unsuccessful.<= br> 2013-12-03 10:05:59,649 ERROR org.apache.hadoop.ha.NodeFencer: Unable to fe= nce service by any configured method.
2013-12-03 10:05:59,650 WARN org.a= pache.hadoop.ha.ActiveStandbyElector: Exception handling the winning of ele= ction
java.lang.RuntimeException: Unable to fence NameNode at hadoop3/10.7.23.124:8020
=A0=A0= =A0 at org.apache.hadoop.ha.ZKFailoverController.doFence(ZKFailoverControll= er.java:522)
=A0=A0=A0 at org.apache.hadoop.ha.ZKFailoverController.fenceOldActive(ZKFai= loverController.java:494)
=A0=A0=A0 at org.apache.hadoop.ha.ZKFailoverController.access$1100(ZKFailov= erController.java:59)
=A0=A0=A0 at org.apache.hadoop.ha.ZKFailoverContro= ller$ElectorCallbacks.fenceOldActive(ZKFailoverController.java:837)
=A0= =A0=A0 at org.apache.hadoop.ha.ActiveStandbyElector.fenceOldActive(ActiveSt= andbyElector.java:900)
=A0=A0=A0 at org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(ActiveS= tandbyElector.java:799)
=A0=A0=A0 at org.apache.hadoop.ha.ActiveStandbyE= lector.processResult(ActiveStandbyElector.java:415)
=A0=A0=A0 at org.apa= che.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:596)
=A0=A0=A0 at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.jav= a:495)
2013-12-03 10:05:59,650 INFO org.apache.hadoop.ha.ActiveStandbyEl= ector: Trying to re-establish ZK session
2013-12-03 10:05:59,676 INFO or= g.apache.zookeeper.ZooKeeper: Session: 0x142931031810260 closed
2013-12-03 10:06:00,678 INFO org.apache.zookeeper.ZooKeeper: Initiating cli= ent connection, connectString=3Dhadoop1:2181,hadoop2:2181,hadoop3:2181 sess= ionTimeout=3D5000 watcher=3Dorg.apache.hadoop.ha.ActiveStandbyElector$Watch= erWithClientRef@5ce2acea
2013-12-03 10:06:00,681 INFO org.apache.zookeeper.ClientCnxn: Opening socke= t connection to server hadoop1/10.7.23.122:2181. Will not attempt to authenticate using SA= SL (Unable to locate a login configuration)
2013-12-03 10:06:00,681 INFO org.apache.zookeeper.ClientCnxn: Socket connec= tion established to hadoop1/10.7.23.122:2181, initiating session
2013-12-03 10:06:00,709 = INFO org.apache.zookeeper.ClientCnxn: Session establishment complete on ser= ver hadoop1/10.7.23.1= 22:2181, sessionid =3D 0x142931031810261, negotiated timeout =3D 5000 2013-12-03 10:06:00,711 INFO org.apache.zookeeper.ClientCnxn: EventThread s= hut down

--20cf302234d5af1f4604ec980787--