Return-Path: X-Original-To: apmail-hadoop-hdfs-user-archive@minotaur.apache.org Delivered-To: apmail-hadoop-hdfs-user-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 614EC10289 for ; Mon, 2 Dec 2013 14:00:12 +0000 (UTC) Received: (qmail 20187 invoked by uid 500); 2 Dec 2013 13:59:30 -0000 Delivered-To: apmail-hadoop-hdfs-user-archive@hadoop.apache.org Received: (qmail 19419 invoked by uid 500); 2 Dec 2013 13:59:30 -0000 Mailing-List: contact user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hadoop.apache.org Delivered-To: mailing list user@hadoop.apache.org Received: (qmail 19153 invoked by uid 99); 2 Dec 2013 13:59:28 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 02 Dec 2013 13:59:28 +0000 X-ASF-Spam-Status: No, hits=1.8 required=5.0 tests=FREEMAIL_ENVFROM_END_DIGIT,HTML_MESSAGE,NORMAL_HTTP_TO_IP,RCVD_IN_DNSWL_LOW,SPF_PASS,WEIRD_PORT X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of yypvsxf19870706@gmail.com designates 209.85.128.51 as permitted sender) Received: from [209.85.128.51] (HELO mail-qe0-f51.google.com) (209.85.128.51) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 02 Dec 2013 13:59:24 +0000 Received: by mail-qe0-f51.google.com with SMTP id 1so11327254qee.10 for ; Mon, 02 Dec 2013 05:59:03 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; bh=g2Vxz7JC3MaTEI7WAnwuIHDFhSkHjFCiBEK4cQnj+kg=; b=caIQcQ3TCtE1ScNQNXqk8bvEnoXTroqSW1GBbXGY5QRfiXB41/Rc6nVC/tHjNdtBuP lGgW05gLbEjzeUC7wxcih2QQl7fAUg3NylsREDP8UP+zBLB286B4aYm+DoYtYXMdqZ1P W1TAW1af/AaYyTdoQs+bUi9cUn4tQtsclO28Ft0bTt9GLEnfxfsuE61i3Kz3gMYEOjY3 MlscykfDYqCWlx21CqAQtCMLjZFsWpMmSFwvEHIMTNqpAOSJLKxOsASJnefFatBAPJgq MUc3wwrWwuhWaB6YFp7SkcdokxznPm7/nsaL18DRNdbgPCwKx7JYFuQHPjGixf/xEAZ8 aHtQ== MIME-Version: 1.0 X-Received: by 10.224.40.195 with SMTP id l3mr114797124qae.44.1385992743115; Mon, 02 Dec 2013 05:59:03 -0800 (PST) Received: by 10.96.128.137 with HTTP; Mon, 2 Dec 2013 05:59:03 -0800 (PST) In-Reply-To: References: Date: Mon, 2 Dec 2013 21:59:03 +0800 Message-ID: Subject: Re: auto-failover does not work From: YouPeng Yang To: user@hadoop.apache.org Content-Type: multipart/alternative; boundary=001a11c2c2ca1d69ab04ec8d95d5 X-Virus-Checked: Checked by ClamAV on apache.org --001a11c2c2ca1d69ab04ec8d95d5 Content-Type: text/plain; charset=ISO-8859-1 Hi Thanks for your reply. It works. Formerly, I setup the ssh with a passwd,and before start-dfs.sh or stop-dfs.sh ,it needs to enter password once by enter ssh-agent bash and ssh-add. Now I recreate the rsa without a passwd.Finnaly it work -HA does the automatic-failover.. But I do think it is a safe way with a password when i create the rsa. Can I acheive the HA automatic-failover also with a ssh setting including a passwd? Regards 2013/12/2 Jitendra Yadav > If you are using hadoop user and you have correct ssh conf then below > commands > should works without password. > > Execute from NN2 & NN1 > # ssh hadoop@NN1_host > > & > > Execute from NN2 & NN1 > # ssh hadoop@NN2_host > > Regards > Jitendra > > > > On Mon, Dec 2, 2013 at 6:10 PM, YouPeng Yang wrote: > >> Hi Jitendra >> Yes >> I'm doubt that it need to enter the ssh-agent bash & ssh-add before I >> ssh the NN from each other.Is it an problem? >> >> Regards >> >> >> >> >> 2013/12/2 Jitendra Yadav >> >>> Are you able to connect both NN hosts using SSH without password? >>> Make sure you have correct ssh keys in authorized key file. >>> >>> Regards >>> Jitendra >>> >>> >>> On Mon, Dec 2, 2013 at 5:50 PM, YouPeng Yang wrote: >>> >>>> Hi Pavan >>>> >>>> >>>> I'm using sshfence >>>> >>>> ------core-site.xml----------------- >>>> >>>> >>>> >>>> fs.defaultFS >>>> hdfs://lklcluster >>>> true >>>> >>>> >>>> >>>> hadoop.tmp.dir >>>> /home/hadoop/tmp2 >>>> >>>> >>>> >>>> >>>> >>>> >>>> -------hdfs-site.xml------------- >>>> >>>> >>>> >>>> dfs.namenode.name.dir >>>> /home/hadoop/namedir2 >>>> >>>> >>>> >>>> dfs.datanode.data.dir >>>> /home/hadoop/datadir2 >>>> >>>> >>>> >>>> dfs.nameservices >>>> lklcluster >>>> >>>> >>>> >>>> dfs.ha.namenodes.lklcluster >>>> nn1,nn2 >>>> >>>> >>>> dfs.namenode.rpc-address.lklcluster.nn1 >>>> hadoop2:8020 >>>> >>>> >>>> dfs.namenode.rpc-address.lklcluster.nn2 >>>> hadoop3:8020 >>>> >>>> >>>> >>>> dfs.namenode.http-address.lklcluster.nn1 >>>> hadoop2:50070 >>>> >>>> >>>> >>>> dfs.namenode.http-address.lklcluster.nn2 >>>> hadoop3:50070 >>>> >>>> >>>> >>>> dfs.namenode.shared.edits.dir >>>> >>>> qjournal://hadoop1:8485;hadoop2:8485;hadoop3:8485/lklcluster >>>> >>>> >>>> dfs.client.failover.proxy.provider.lklcluster >>>> >>>> org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider >>>> >>>> >>>> dfs.ha.fencing.methods >>>> sshfence >>>> >>>> >>>> >>>> dfs.ha.fencing.ssh.private-key-files >>>> /home/hadoop/.ssh/id_rsa >>>> >>>> >>>> >>>> dfs.ha.fencing.ssh.connect-timeout >>>> 5000 >>>> >>>> >>>> >>>> dfs.journalnode.edits.dir >>>> /home/hadoop/journal/data >>>> >>>> >>>> >>>> dfs.ha.automatic-failover.enabled >>>> true >>>> >>>> >>>> >>>> ha.zookeeper.quorum >>>> hadoop1:2181,hadoop2:2181,hadoop3:2181 >>>> >>>> >>>> >>>> >>>> >>>> 2013/12/2 Pavan Kumar Polineni >>>> >>>>> Post your config files and in which method you are following for >>>>> automatic failover >>>>> >>>>> >>>>> On Mon, Dec 2, 2013 at 5:34 PM, YouPeng Yang < >>>>> yypvsxf19870706@gmail.com> wrote: >>>>> >>>>>> Hi i >>>>>> I'm testing the HA auto-failover within hadoop-2.2.0 >>>>>> >>>>>> The cluster can be manully failover ,however failed with the >>>>>> automatic failover. >>>>>> I setup the HA according to the URL >>>>>> >>>>>> http://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/HDFSHighAvailabilityWithQJM.html >>>>>> >>>>>> When I test the automatic failover, I killed my active NN by kill >>>>>> -9 ,while the standby namenode does not change to active state. >>>>>> It came out the log in my DFSZKFailoverController as [1] >>>>>> >>>>>> Please help me ,any suggestion will be appreciated. >>>>>> >>>>>> >>>>>> Regards. >>>>>> >>>>>> >>>>>> zkfc >>>>>> log[1]---------------------------------------------------------------------------------------------------- >>>>>> >>>>>> 2013-12-02 19:49:28,588 INFO org.apache.hadoop.ha.NodeFencer: ====== >>>>>> Beginning Service Fencing Process... ====== >>>>>> 2013-12-02 19:49:28,588 INFO org.apache.hadoop.ha.NodeFencer: Trying >>>>>> method 1/1: org.apache.hadoop.ha.SshFenceByTcpPort(null) >>>>>> 2013-12-02 19:49:28,590 INFO org.apache.hadoop.ha.SshFenceByTcpPort: >>>>>> Connecting to hadoop3... >>>>>> 2013-12-02 19:49:28,590 INFO >>>>>> org.apache.hadoop.ha.SshFenceByTcpPort.jsch: Connecting to hadoop3 port 22 >>>>>> 2013-12-02 19:49:28,592 INFO >>>>>> org.apache.hadoop.ha.SshFenceByTcpPort.jsch: Connection established >>>>>> 2013-12-02 19:49:28,603 INFO >>>>>> org.apache.hadoop.ha.SshFenceByTcpPort.jsch: Remote version string: >>>>>> SSH-2.0-OpenSSH_5.3 >>>>>> 2013-12-02 19:49:28,603 INFO >>>>>> org.apache.hadoop.ha.SshFenceByTcpPort.jsch: Local version string: >>>>>> SSH-2.0-JSCH-0.1.42 >>>>>> 2013-12-02 19:49:28,603 INFO >>>>>> org.apache.hadoop.ha.SshFenceByTcpPort.jsch: CheckCiphers: >>>>>> aes256-ctr,aes192-ctr,aes128-ctr,aes256-cbc,aes192-cbc,aes128-cbc,3des-ctr,arcfour,arcfour128,arcfour256 >>>>>> 2013-12-02 19:49:28,608 INFO >>>>>> org.apache.hadoop.ha.SshFenceByTcpPort.jsch: aes256-ctr is not available. >>>>>> 2013-12-02 19:49:28,608 INFO >>>>>> org.apache.hadoop.ha.SshFenceByTcpPort.jsch: aes192-ctr is not available. >>>>>> 2013-12-02 19:49:28,608 INFO >>>>>> org.apache.hadoop.ha.SshFenceByTcpPort.jsch: aes256-cbc is not available. >>>>>> 2013-12-02 19:49:28,608 INFO >>>>>> org.apache.hadoop.ha.SshFenceByTcpPort.jsch: aes192-cbc is not available. >>>>>> 2013-12-02 19:49:28,609 INFO >>>>>> org.apache.hadoop.ha.SshFenceByTcpPort.jsch: arcfour256 is not available. >>>>>> 2013-12-02 19:49:28,610 INFO >>>>>> org.apache.hadoop.ha.SshFenceByTcpPort.jsch: SSH_MSG_KEXINIT sent >>>>>> 2013-12-02 19:49:28,610 INFO >>>>>> org.apache.hadoop.ha.SshFenceByTcpPort.jsch: SSH_MSG_KEXINIT received >>>>>> 2013-12-02 19:49:28,610 INFO >>>>>> org.apache.hadoop.ha.SshFenceByTcpPort.jsch: kex: server->client aes128-ctr >>>>>> hmac-md5 none >>>>>> 2013-12-02 19:49:28,610 INFO >>>>>> org.apache.hadoop.ha.SshFenceByTcpPort.jsch: kex: client->server aes128-ctr >>>>>> hmac-md5 none >>>>>> 2013-12-02 19:49:28,617 INFO >>>>>> org.apache.hadoop.ha.SshFenceByTcpPort.jsch: SSH_MSG_KEXDH_INIT sent >>>>>> 2013-12-02 19:49:28,617 INFO >>>>>> org.apache.hadoop.ha.SshFenceByTcpPort.jsch: expecting SSH_MSG_KEXDH_REPLY >>>>>> 2013-12-02 19:49:28,634 INFO >>>>>> org.apache.hadoop.ha.SshFenceByTcpPort.jsch: ssh_rsa_verify: signature true >>>>>> 2013-12-02 19:49:28,635 WARN >>>>>> org.apache.hadoop.ha.SshFenceByTcpPort.jsch: Permanently added 'hadoop3' >>>>>> (RSA) to the list of known hosts. >>>>>> 2013-12-02 19:49:28,635 INFO >>>>>> org.apache.hadoop.ha.SshFenceByTcpPort.jsch: SSH_MSG_NEWKEYS sent >>>>>> 2013-12-02 19:49:28,635 INFO >>>>>> org.apache.hadoop.ha.SshFenceByTcpPort.jsch: SSH_MSG_NEWKEYS received >>>>>> 2013-12-02 19:49:28,636 INFO >>>>>> org.apache.hadoop.ha.SshFenceByTcpPort.jsch: SSH_MSG_SERVICE_REQUEST sent >>>>>> 2013-12-02 19:49:28,637 INFO >>>>>> org.apache.hadoop.ha.SshFenceByTcpPort.jsch: SSH_MSG_SERVICE_ACCEPT received >>>>>> 2013-12-02 19:49:28,638 INFO >>>>>> org.apache.hadoop.ha.SshFenceByTcpPort.jsch: Authentications that can >>>>>> continue: gssapi-with-mic,publickey,keyboard-interactive,password >>>>>> 2013-12-02 19:49:28,639 INFO >>>>>> org.apache.hadoop.ha.SshFenceByTcpPort.jsch: Next authentication method: >>>>>> gssapi-with-mic >>>>>> 2013-12-02 19:49:28,642 INFO >>>>>> org.apache.hadoop.ha.SshFenceByTcpPort.jsch: Authentications that can >>>>>> continue: publickey,keyboard-interactive,password >>>>>> 2013-12-02 19:49:28,642 INFO >>>>>> org.apache.hadoop.ha.SshFenceByTcpPort.jsch: Next authentication method: >>>>>> publickey >>>>>> 2013-12-02 19:49:28,644 INFO >>>>>> org.apache.hadoop.ha.SshFenceByTcpPort.jsch: Disconnecting from hadoop3 >>>>>> port 22 >>>>>> 2013-12-02 19:49:28,644 WARN org.apache.hadoop.ha.SshFenceByTcpPort: >>>>>> Unable to connect to hadoop3 as user hadoop >>>>>> com.jcraft.jsch.JSchException: Auth fail >>>>>> at com.jcraft.jsch.Session.connect(Session.java:452) >>>>>> at >>>>>> org.apache.hadoop.ha.SshFenceByTcpPort.tryFence(SshFenceByTcpPort.java:100) >>>>>> at org.apache.hadoop.ha.NodeFencer.fence(NodeFencer.java:97) >>>>>> at >>>>>> org.apache.hadoop.ha.ZKFailoverController.doFence(ZKFailoverController.java:521) >>>>>> at >>>>>> org.apache.hadoop.ha.ZKFailoverController.fenceOldActive(ZKFailoverController.java:494) >>>>>> at >>>>>> org.apache.hadoop.ha.ZKFailoverController.access$1100(ZKFailoverController.java:59) >>>>>> at >>>>>> org.apache.hadoop.ha.ZKFailoverController$ElectorCallbacks.fenceOldActive(ZKFailoverController.java:837) >>>>>> at >>>>>> org.apache.hadoop.ha.ActiveStandbyElector.fenceOldActive(ActiveStandbyElector.java:900) >>>>>> at >>>>>> org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(ActiveStandbyElector.java:799) >>>>>> at >>>>>> org.apache.hadoop.ha.ActiveStandbyElector.processResult(ActiveStandbyElector.java:415) >>>>>> at >>>>>> org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:596) >>>>>> at >>>>>> org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:495) >>>>>> 2013-12-02 19:49:28,644 WARN org.apache.hadoop.ha.NodeFencer: Fencing >>>>>> method org.apache.hadoop.ha.SshFenceByTcpPort(null) was unsuccessful. >>>>>> 2013-12-02 19:49:28,645 ERROR org.apache.hadoop.ha.NodeFencer: Unable >>>>>> to fence service by any configured method. >>>>>> 2013-12-02 19:49:28,645 INFO >>>>>> org.apache.hadoop.ha.ZKFailoverController: Local service NameNode at >>>>>> hadoop2/10.7.23.125:8020 entered state: SERVICE_NOT_RESPONDING >>>>>> 2013-12-02 19:49:28,646 WARN >>>>>> org.apache.hadoop.ha.ActiveStandbyElector: Exception handling the winning >>>>>> of election >>>>>> java.lang.RuntimeException: Unable to fence NameNode at hadoop3/ >>>>>> 10.7.23.124:8020 >>>>>> at >>>>>> org.apache.hadoop.ha.ZKFailoverController.doFence(ZKFailoverController.java:522) >>>>>> at >>>>>> org.apache.hadoop.ha.ZKFailoverController.fenceOldActive(ZKFailoverController.java:494) >>>>>> at >>>>>> org.apache.hadoop.ha.ZKFailoverController.access$1100(ZKFailoverController.java:59) >>>>>> at >>>>>> org.apache.hadoop.ha.ZKFailoverController$ElectorCallbacks.fenceOldActive(ZKFailoverController.java:837) >>>>>> at >>>>>> org.apache.hadoop.ha.ActiveStandbyElector.fenceOldActive(ActiveStandbyElector.java:900) >>>>>> at >>>>>> org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(ActiveStandbyElector.java:799) >>>>>> at >>>>>> org.apache.hadoop.ha.ActiveStandbyElector.processResult(ActiveStandbyElector.java:415) >>>>>> at >>>>>> org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:596) >>>>>> at >>>>>> org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:495) >>>>>> 2013-12-02 19:49:28,646 INFO >>>>>> org.apache.hadoop.ha.ActiveStandbyElector: Trying to re-establish ZK session >>>>>> 2013-12-02 19:49:28,669 INFO org.apache.zookeeper.ZooKeeper: Session: >>>>>> 0x2429313c808025b closed >>>>>> 2013-12-02 19:49:29,672 INFO org.apache.zookeeper.ZooKeeper: >>>>>> Initiating client connection, >>>>>> connectString=hadoop1:2181,hadoop2:2181,hadoop3:2181 sessionTimeout=5000 >>>>>> watcher=org.apache.hadoop.ha.ActiveStandbyElector$WatcherWithClientRef@3545fe3b >>>>>> 2013-12-02 19:49:29,675 INFO org.apache.zookeeper.ClientCnxn: Opening >>>>>> socket connection to server hadoop3/10.7.23.124:2181. Will not >>>>>> attempt to authenticate using SASL (Unable to locate a login configuration) >>>>>> 2013-12-02 19:49:29,675 INFO org.apache.zookeeper.ClientCnxn: Socket >>>>>> connection established to hadoop3/10.7.23.124:2181, initiating >>>>>> session >>>>>> 2013-12-02 19:49:29,699 INFO org.apache.zookeeper.ClientCnxn: Session >>>>>> establishment complete on server hadoop3/10.7.23.124:2181, sessionid >>>>>> = 0x3429312ba330262, negotiated timeout = 5000 >>>>>> 2013-12-02 19:49:29,702 INFO org.apache.zookeeper.ClientCnxn: >>>>>> EventThread shut down >>>>>> 2013-12-02 19:49:29,706 INFO >>>>>> org.apache.hadoop.ha.ActiveStandbyElector: Session connected. >>>>>> 2013-12-02 19:49:29,706 INFO >>>>>> org.apache.hadoop.ha.ZKFailoverController: Quitting master election for >>>>>> NameNode at hadoop2/10.7.23.125:8020 and marking that fencing is >>>>>> necessary >>>>>> 2013-12-02 19:49:29,706 INFO >>>>>> org.apache.hadoop.ha.ActiveStandbyElector: Yielding from election >>>>>> 2013-12-02 19:49:29,727 INFO org.apache.zookeeper.ZooKeeper: Session: >>>>>> 0x3429312ba330262 closed >>>>>> 2013-12-02 19:49:29,728 WARN >>>>>> org.apache.hadoop.ha.ActiveStandbyElector: Ignoring stale result from old >>>>>> client with sessionId 0x3429312ba330262 >>>>>> 2013-12-02 19:49:29,728 INFO org.apache.zookeeper.ClientCnxn: >>>>>> EventThread shut down >>>>>> >>>>> >>>>> >>>> >>> >> > --001a11c2c2ca1d69ab04ec8d95d5 Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable
Hi
=A0=A0 Thanks for your reply. It wor= ks.
=A0=A0 Formerly, I setup the ssh with = a passwd,and before start-dfs.sh or stop-dfs.sh ,it needs to enter password= once by enter=A0 ssh-agent bash and ssh-add.
=A0=A0 Now I recreate the rsa without a passwd.Finnaly it work -HA do= es the automatic-failover..

=A0=A0 But=A0 I do think=A0 it is = a safe way with a password when i create=A0 the rsa.
=A0=A0 Can I acheiv= e the HA automatic-failover also with a ssh setting including a passwd?

=A0
Regards
=

2013/12/2 Jitendra Yadav <= ;jeetuyadav= 200890@gmail.com>
If you are using hadoop use= r and you have correct ssh conf =A0then below commands
should works wit= hout password.

Execute from NN2 & NN1
# ssh hadoop@NN1_h= ost

&

Execute from NN2 & N= N1
# ssh hadoop@NN2_host

Regards
Jitendra


On Mon, Dec 2, 2013 at 6:10 PM, YouPeng Yang= <yypvsxf19870706@gmail.com> wrote:
Hi Jitendra
=A0 Yes
=A0 I'm doubt that it= need to enter the ssh-agent bash & ssh-add=A0 before I ssh the NN from= each other.Is it an problem?

Regards




2013/12/2 Jitendra Yadav <jeet= uyadav200890@gmail.com>
Are you able to connect both NN hosts using SSH without pa= ssword?
Make sure you have correct ssh keys in authorized key=A0file.

Regards
Jitendra


On Mon, Dec 2, 2013 at 5:50 PM, YouPeng = Yang <yypvsxf19870706@gmail.com> wrote:
Hi Pavan

=A0
=A0 I'm using s= shfence

------core-site.xml-----------------

<configuration>
=A0<property>
=A0=A0=A0= =A0 <name>fs.defaultFS</name>
=A0=A0=A0=A0 <value>hdfs://lklcluster</value>
=A0=A0=A0=A0 &= lt;final>true</final>
=A0</property>
=A0
=A0<pro= perty>
=A0=A0=A0=A0 <name>hadoop.tmp.dir</name>
=A0=A0= =A0=A0 <value>/home/hadoop/tmp2</value>
=A0</property>


</configuration>


---= ----hdfs-site.xml-------------

<configuration>
=A0<= property>
=A0=A0=A0=A0 <name>dfs.namenode.name.dir</name>=
=A0=A0=A0 <value>/home/hadoop/namedir2</value>=A0
=A0</property>

=A0<property>
=A0=A0=A0=A0 <name>= ;dfs.datanode.data.dir</name>
=A0=A0=A0=A0 <value>/home/hado= op/datadir2</value>
=A0</property>

=A0<property>= ;
=A0=A0 <name>dfs.nameservices</name>
=A0=A0 <value>lklcluster</value>
</property>

&l= t;property>
=A0=A0=A0 <name>dfs.ha.namenodes.lklcluster</nam= e>
=A0=A0=A0 <value>nn1,nn2</value>
</property><= br><property>
=A0 <name>dfs.namenode.rpc-address.lklcluster.nn1</name>
=A0= <value>hadoop2:8020</value>
</property>
<proper= ty>
=A0=A0=A0 <name>dfs.namenode.rpc-address.lklcluster.nn2<= /name>
=A0=A0=A0 <value>hadoop3:8020</value>
</property>
<= br><property>
=A0 <name>dfs.namenode.http-address.lklcluster= .nn1</name>
=A0=A0=A0 <value>hadoop2:50070</value>
= </property>

<property>
=A0=A0=A0 <name>dfs.namenode.http-address.lkl= cluster.nn2</name>
=A0=A0=A0 <value>hadoop3:50070</value&= gt;
</property>

<property>
=A0 <name>dfs.nam= enode.shared.edits.dir</name>
=A0 <value>qjournal://hadoop1:8485;hadoop2:8485;hadoop3:8485/lklclust= er</value>
</property>
<property>
=A0 <name&g= t;dfs.client.failover.proxy.provider.lklcluster</name>
=A0 <val= ue>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProv= ider</value>
</property>
<property>
=A0 <name>dfs.ha.fencing.met= hods</name>
=A0 <value>sshfence</value>
</proper= ty>

<property>
=A0 <name>dfs.ha.fencing.ssh.privat= e-key-files</name>
=A0=A0 <value>/home/hadoop/.ssh/id_rsa</value>
</property= >
=A0=A0=A0
<property>=A0
=A0=A0=A0 <name>dfs.ha.= fencing.ssh.connect-timeout</name>=A0
=A0=A0=A0=A0 <value>5= 000</value>=A0
</property>=A0

<property>
=A0 <name>dfs.journalnode.edits.dir</name&= gt;
=A0=A0 <value>/home/hadoop/journal/data</value>
</= property>

<property>
=A0=A0 <name>dfs.ha.automatic= -failover.enabled</name>
=A0=A0=A0=A0=A0 <value>true</value>
</property>
=A0= =A0=A0=A0=A0=A0=A0
<property>
=A0=A0=A0=A0 <name>ha.zook= eeper.quorum</name>
=A0=A0=A0=A0 <value>hadoop1:2181,hadoop2= :2181,hadoop3:2181</value>
</property>

</configuration>


2013/12/2 Pavan= Kumar Polineni <smartsunnyb4u@gmail.com>
Post your config files and = in which method you are following for automatic failover


On Mon, Dec 2, 2013 at 5:34 PM, You= Peng Yang <yypvsxf19870706@gmail.com> wrote:
<= div>
Hi i
=A0 I'm testing the HA auto-failover within= hadoop-2.2.0
=A0
=A0 The cluster can be manully failover ,however failed with the auto= matic failover.
I setup the HA according to=A0 the URL
=A0 http://hadoop.ap= ache.org/docs/current/hadoop-yarn/hadoop-yarn-site/HDFSHighAvailabilityWith= QJM.html
=A0=A0=A0
=A0 When I test the automatic failover, I killed my active NN by kill= -9 <Pid-nn>,while the standby namenode does not change to active sta= te.
=A0 It came out the log in my DFSZKFailoverController as [1]

=A0Please help me ,any suggestion will be apprecia= ted.


Regards.


zkfc log[1]------= ---------------------------------------------------------------------------= -------------------

2013-12-02 19:49:28,588 INFO org.apache.hadoop.ha.NodeFencer= : =3D=3D=3D=3D=3D=3D Beginning Service Fencing Process... =3D=3D=3D=3D=3D= =3D
2013-12-02 19:49:28,588 INFO org.apache.hadoop.ha.NodeFencer: Trying= method 1/1: org.apache.hadoop.ha.SshFenceByTcpPort(null)
2013-12-02 19:49:28,590 INFO org.apache.hadoop.ha.SshFenceByTcpPort: Connec= ting to hadoop3...
2013-12-02 19:49:28,590 INFO org.apache.hadoop.ha.Ssh= FenceByTcpPort.jsch: Connecting to hadoop3 port 22
2013-12-02 19:49:28,5= 92 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch: Connection established=
2013-12-02 19:49:28,603 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch: R= emote version string: SSH-2.0-OpenSSH_5.3
2013-12-02 19:49:28,603 INFO o= rg.apache.hadoop.ha.SshFenceByTcpPort.jsch: Local version string: SSH-2.0-J= SCH-0.1.42
2013-12-02 19:49:28,603 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch: C= heckCiphers: aes256-ctr,aes192-ctr,aes128-ctr,aes256-cbc,aes192-cbc,aes128-= cbc,3des-ctr,arcfour,arcfour128,arcfour256
2013-12-02 19:49:28,608 INFO = org.apache.hadoop.ha.SshFenceByTcpPort.jsch: aes256-ctr is not available. 2013-12-02 19:49:28,608 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch: a= es192-ctr is not available.
2013-12-02 19:49:28,608 INFO org.apache.hado= op.ha.SshFenceByTcpPort.jsch: aes256-cbc is not available.
2013-12-02 19= :49:28,608 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch: aes192-cbc is = not available.
2013-12-02 19:49:28,609 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch: a= rcfour256 is not available.
2013-12-02 19:49:28,610 INFO org.apache.hado= op.ha.SshFenceByTcpPort.jsch: SSH_MSG_KEXINIT sent
2013-12-02 19:49:28,6= 10 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch: SSH_MSG_KEXINIT receiv= ed
2013-12-02 19:49:28,610 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch: k= ex: server->client aes128-ctr hmac-md5 none
2013-12-02 19:49:28,610 I= NFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch: kex: client->server aes= 128-ctr hmac-md5 none
2013-12-02 19:49:28,617 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch: S= SH_MSG_KEXDH_INIT sent
2013-12-02 19:49:28,617 INFO org.apache.hadoop.ha= .SshFenceByTcpPort.jsch: expecting SSH_MSG_KEXDH_REPLY
2013-12-02 19:49:= 28,634 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch: ssh_rsa_verify: si= gnature true
2013-12-02 19:49:28,635 WARN org.apache.hadoop.ha.SshFenceByTcpPort.jsch: P= ermanently added 'hadoop3' (RSA) to the list of known hosts.
201= 3-12-02 19:49:28,635 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch: SSH_= MSG_NEWKEYS sent
2013-12-02 19:49:28,635 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch: S= SH_MSG_NEWKEYS received
2013-12-02 19:49:28,636 INFO org.apache.hadoop.h= a.SshFenceByTcpPort.jsch: SSH_MSG_SERVICE_REQUEST sent
2013-12-02 19:49:= 28,637 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch: SSH_MSG_SERVICE_AC= CEPT received
2013-12-02 19:49:28,638 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch: A= uthentications that can continue: gssapi-with-mic,publickey,keyboard-intera= ctive,password
2013-12-02 19:49:28,639 INFO org.apache.hadoop.ha.SshFenc= eByTcpPort.jsch: Next authentication method: gssapi-with-mic
2013-12-02 19:49:28,642 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch: A= uthentications that can continue: publickey,keyboard-interactive,password2013-12-02 19:49:28,642 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch:= Next authentication method: publickey
2013-12-02 19:49:28,644 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch: D= isconnecting from hadoop3 port 22
2013-12-02 19:49:28,644 WARN org.apach= e.hadoop.ha.SshFenceByTcpPort: Unable to connect to hadoop3 as user hadoop<= br> com.jcraft.jsch.JSchException: Auth fail
=A0=A0=A0 at com.jcraft.jsch.Se= ssion.connect(Session.java:452)
=A0=A0=A0 at org.apache.hadoop.ha.SshFen= ceByTcpPort.tryFence(SshFenceByTcpPort.java:100)
=A0=A0=A0 at org.apache= .hadoop.ha.NodeFencer.fence(NodeFencer.java:97)
=A0=A0=A0 at org.apache.hadoop.ha.ZKFailoverController.doFence(ZKFailoverCo= ntroller.java:521)
=A0=A0=A0 at org.apache.hadoop.ha.ZKFailoverControlle= r.fenceOldActive(ZKFailoverController.java:494)
=A0=A0=A0 at org.apache.= hadoop.ha.ZKFailoverController.access$1100(ZKFailoverController.java:59) =A0=A0=A0 at org.apache.hadoop.ha.ZKFailoverController$ElectorCallbacks.fen= ceOldActive(ZKFailoverController.java:837)
=A0=A0=A0 at org.apache.hadoo= p.ha.ActiveStandbyElector.fenceOldActive(ActiveStandbyElector.java:900)
= =A0=A0=A0 at org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(ActiveS= tandbyElector.java:799)
=A0=A0=A0 at org.apache.hadoop.ha.ActiveStandbyElector.processResult(Active= StandbyElector.java:415)
=A0=A0=A0 at org.apache.zookeeper.ClientCnxn$Ev= entThread.processEvent(ClientCnxn.java:596)
=A0=A0=A0 at org.apache.zook= eeper.ClientCnxn$EventThread.run(ClientCnxn.java:495)
2013-12-02 19:49:28,644 WARN org.apache.hadoop.ha.NodeFencer: Fencing metho= d org.apache.hadoop.ha.SshFenceByTcpPort(null) was unsuccessful.
2013-12= -02 19:49:28,645 ERROR org.apache.hadoop.ha.NodeFencer: Unable to fence ser= vice by any configured method.
2013-12-02 19:49:28,645 INFO org.apache.hadoop.ha.ZKFailoverController: Loc= al service NameNode at hadoop2/10.7.23.125:8020 entered state: SERVICE_NOT_RESPONDING
2= 013-12-02 19:49:28,646 WARN org.apache.hadoop.ha.ActiveStandbyElector: Exce= ption handling the winning of election
java.lang.RuntimeException: Unable to fence NameNode at hadoop3/10.7.23.124:8020
=A0=A0= =A0 at org.apache.hadoop.ha.ZKFailoverController.doFence(ZKFailoverControll= er.java:522)
=A0=A0=A0 at org.apache.hadoop.ha.ZKFailoverController.fenceOldActive(ZKFai= loverController.java:494)
=A0=A0=A0 at org.apache.hadoop.ha.ZKFailoverController.access$1100(ZKFailov= erController.java:59)
=A0=A0=A0 at org.apache.hadoop.ha.ZKFailoverContro= ller$ElectorCallbacks.fenceOldActive(ZKFailoverController.java:837)
=A0= =A0=A0 at org.apache.hadoop.ha.ActiveStandbyElector.fenceOldActive(ActiveSt= andbyElector.java:900)
=A0=A0=A0 at org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(ActiveS= tandbyElector.java:799)
=A0=A0=A0 at org.apache.hadoop.ha.ActiveStandbyE= lector.processResult(ActiveStandbyElector.java:415)
=A0=A0=A0 at org.apa= che.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:596)
=A0=A0=A0 at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.jav= a:495)
2013-12-02 19:49:28,646 INFO org.apache.hadoop.ha.ActiveStandbyEl= ector: Trying to re-establish ZK session
2013-12-02 19:49:28,669 INFO or= g.apache.zookeeper.ZooKeeper: Session: 0x2429313c808025b closed
2013-12-02 19:49:29,672 INFO org.apache.zookeeper.ZooKeeper: Initiating cli= ent connection, connectString=3Dhadoop1:2181,hadoop2:2181,hadoop3:2181 sess= ionTimeout=3D5000 watcher=3Dorg.apache.hadoop.ha.ActiveStandbyElector$Watch= erWithClientRef@3545fe3b
2013-12-02 19:49:29,675 INFO org.apache.zookeeper.ClientCnxn: Opening socke= t connection to server hadoop3/10.7.23.124:2181. Will not attempt to authenticate using SA= SL (Unable to locate a login configuration)
2013-12-02 19:49:29,675 INFO org.apache.zookeeper.ClientCnxn: Socket connec= tion established to hadoop3/10.7.23.124:2181, initiating session
2013-12-02 19:49:29,699 = INFO org.apache.zookeeper.ClientCnxn: Session establishment complete on ser= ver hadoop3/10.7.23.1= 24:2181, sessionid =3D 0x3429312ba330262, negotiated timeout =3D 5000 2013-12-02 19:49:29,702 INFO org.apache.zookeeper.ClientCnxn: EventThread s= hut down
2013-12-02 19:49:29,706 INFO org.apache.hadoop.ha.ActiveStandby= Elector: Session connected.
2013-12-02 19:49:29,706 INFO org.apache.hado= op.ha.ZKFailoverController: Quitting master election for NameNode at hadoop= 2/10.7.23.125:8020 and marking that fencing is necessary
2013-12-02 19:49:29,706 INFO org.apache.hadoop.ha.ActiveStandbyElector: Yie= lding from election
2013-12-02 19:49:29,727 INFO org.apache.zookeeper.Zo= oKeeper: Session: 0x3429312ba330262 closed
2013-12-02 19:49:29,728 WARN = org.apache.hadoop.ha.ActiveStandbyElector: Ignoring stale result from old c= lient with sessionId 0x3429312ba330262
2013-12-02 19:49:29,728 INFO org.apache.zookeeper.ClientCnxn: EventThread s= hut down






--001a11c2c2ca1d69ab04ec8d95d5--