hadoop-mapreduce-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From YouPeng Yang <yypvsxf19870...@gmail.com>
Subject Re: auto-failover does not work
Date Mon, 02 Dec 2013 13:59:03 GMT
Hi
   Thanks for your reply. It works.
   Formerly, I setup the ssh with a passwd,and before start-dfs.sh or
stop-dfs.sh ,it needs to enter password once by enter  ssh-agent bash and
ssh-add.
   Now I recreate the rsa without a passwd.Finnaly it work -HA does the
automatic-failover..

   But  I do think  it is a safe way with a password when i create  the rsa.
   Can I acheive the HA automatic-failover also with a ssh setting
including a passwd?


Regards


2013/12/2 Jitendra Yadav <jeetuyadav200890@gmail.com>

> If you are using hadoop user and you have correct ssh conf  then below
> commands
> should works without password.
>
> Execute from NN2 & NN1
> # ssh hadoop@NN1_host
>
> &
>
> Execute from NN2 & NN1
> # ssh hadoop@NN2_host
>
> Regards
> Jitendra
>
>
>
> On Mon, Dec 2, 2013 at 6:10 PM, YouPeng Yang <yypvsxf19870706@gmail.com>wrote:
>
>> Hi Jitendra
>>   Yes
>>   I'm doubt that it need to enter the ssh-agent bash & ssh-add  before I
>> ssh the NN from each other.Is it an problem?
>>
>> Regards
>>
>>
>>
>>
>> 2013/12/2 Jitendra Yadav <jeetuyadav200890@gmail.com>
>>
>>> Are you able to connect both NN hosts using SSH without password?
>>> Make sure you have correct ssh keys in authorized key file.
>>>
>>> Regards
>>> Jitendra
>>>
>>>
>>> On Mon, Dec 2, 2013 at 5:50 PM, YouPeng Yang <yypvsxf19870706@gmail.com>wrote:
>>>
>>>> Hi Pavan
>>>>
>>>>
>>>>   I'm using sshfence
>>>>
>>>> ------core-site.xml-----------------
>>>>
>>>> <configuration>
>>>>  <property>
>>>>      <name>fs.defaultFS</name>
>>>>      <value>hdfs://lklcluster</value>
>>>>      <final>true</final>
>>>>  </property>
>>>>
>>>>  <property>
>>>>      <name>hadoop.tmp.dir</name>
>>>>      <value>/home/hadoop/tmp2</value>
>>>>  </property>
>>>>
>>>>
>>>> </configuration>
>>>>
>>>>
>>>> -------hdfs-site.xml-------------
>>>>
>>>> <configuration>
>>>>  <property>
>>>>      <name>dfs.namenode.name.dir</name>
>>>>     <value>/home/hadoop/namedir2</value>
>>>>  </property>
>>>>
>>>>  <property>
>>>>      <name>dfs.datanode.data.dir</name>
>>>>      <value>/home/hadoop/datadir2</value>
>>>>  </property>
>>>>
>>>>  <property>
>>>>    <name>dfs.nameservices</name>
>>>>    <value>lklcluster</value>
>>>> </property>
>>>>
>>>> <property>
>>>>     <name>dfs.ha.namenodes.lklcluster</name>
>>>>     <value>nn1,nn2</value>
>>>> </property>
>>>> <property>
>>>>   <name>dfs.namenode.rpc-address.lklcluster.nn1</name>
>>>>   <value>hadoop2:8020</value>
>>>> </property>
>>>> <property>
>>>>     <name>dfs.namenode.rpc-address.lklcluster.nn2</name>
>>>>     <value>hadoop3:8020</value>
>>>> </property>
>>>>
>>>> <property>
>>>>   <name>dfs.namenode.http-address.lklcluster.nn1</name>
>>>>     <value>hadoop2:50070</value>
>>>> </property>
>>>>
>>>> <property>
>>>>     <name>dfs.namenode.http-address.lklcluster.nn2</name>
>>>>     <value>hadoop3:50070</value>
>>>> </property>
>>>>
>>>> <property>
>>>>   <name>dfs.namenode.shared.edits.dir</name>
>>>>
>>>> <value>qjournal://hadoop1:8485;hadoop2:8485;hadoop3:8485/lklcluster</value>
>>>> </property>
>>>> <property>
>>>>   <name>dfs.client.failover.proxy.provider.lklcluster</name>
>>>>
>>>> <value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value>
>>>> </property>
>>>> <property>
>>>>   <name>dfs.ha.fencing.methods</name>
>>>>   <value>sshfence</value>
>>>> </property>
>>>>
>>>> <property>
>>>>   <name>dfs.ha.fencing.ssh.private-key-files</name>
>>>>    <value>/home/hadoop/.ssh/id_rsa</value>
>>>> </property>
>>>>
>>>> <property>
>>>>     <name>dfs.ha.fencing.ssh.connect-timeout</name>
>>>>      <value>5000</value>
>>>> </property>
>>>>
>>>> <property>
>>>>   <name>dfs.journalnode.edits.dir</name>
>>>>    <value>/home/hadoop/journal/data</value>
>>>> </property>
>>>>
>>>> <property>
>>>>    <name>dfs.ha.automatic-failover.enabled</name>
>>>>       <value>true</value>
>>>> </property>
>>>>
>>>> <property>
>>>>      <name>ha.zookeeper.quorum</name>
>>>>      <value>hadoop1:2181,hadoop2:2181,hadoop3:2181</value>
>>>> </property>
>>>>
>>>> </configuration>
>>>>
>>>>
>>>> 2013/12/2 Pavan Kumar Polineni <smartsunnyb4u@gmail.com>
>>>>
>>>>> Post your config files and in which method you are following for
>>>>> automatic failover
>>>>>
>>>>>
>>>>> On Mon, Dec 2, 2013 at 5:34 PM, YouPeng Yang <
>>>>> yypvsxf19870706@gmail.com> wrote:
>>>>>
>>>>>> Hi i
>>>>>>   I'm testing the HA auto-failover within hadoop-2.2.0
>>>>>>
>>>>>>   The cluster can be manully failover ,however failed with the
>>>>>> automatic failover.
>>>>>> I setup the HA according to  the URL
>>>>>>
>>>>>> http://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/HDFSHighAvailabilityWithQJM.html
>>>>>>
>>>>>>   When I test the automatic failover, I killed my active NN by kill
>>>>>> -9 <Pid-nn>,while the standby namenode does not change to active
state.
>>>>>>   It came out the log in my DFSZKFailoverController as [1]
>>>>>>
>>>>>>  Please help me ,any suggestion will be appreciated.
>>>>>>
>>>>>>
>>>>>> Regards.
>>>>>>
>>>>>>
>>>>>> zkfc
>>>>>> log[1]----------------------------------------------------------------------------------------------------
>>>>>>
>>>>>> 2013-12-02 19:49:28,588 INFO org.apache.hadoop.ha.NodeFencer: ======
>>>>>> Beginning Service Fencing Process... ======
>>>>>> 2013-12-02 19:49:28,588 INFO org.apache.hadoop.ha.NodeFencer: Trying
>>>>>> method 1/1: org.apache.hadoop.ha.SshFenceByTcpPort(null)
>>>>>> 2013-12-02 19:49:28,590 INFO org.apache.hadoop.ha.SshFenceByTcpPort:
>>>>>> Connecting to hadoop3...
>>>>>> 2013-12-02 19:49:28,590 INFO
>>>>>> org.apache.hadoop.ha.SshFenceByTcpPort.jsch: Connecting to hadoop3
port 22
>>>>>> 2013-12-02 19:49:28,592 INFO
>>>>>> org.apache.hadoop.ha.SshFenceByTcpPort.jsch: Connection established
>>>>>> 2013-12-02 19:49:28,603 INFO
>>>>>> org.apache.hadoop.ha.SshFenceByTcpPort.jsch: Remote version string:
>>>>>> SSH-2.0-OpenSSH_5.3
>>>>>> 2013-12-02 19:49:28,603 INFO
>>>>>> org.apache.hadoop.ha.SshFenceByTcpPort.jsch: Local version string:
>>>>>> SSH-2.0-JSCH-0.1.42
>>>>>> 2013-12-02 19:49:28,603 INFO
>>>>>> org.apache.hadoop.ha.SshFenceByTcpPort.jsch: CheckCiphers:
>>>>>> aes256-ctr,aes192-ctr,aes128-ctr,aes256-cbc,aes192-cbc,aes128-cbc,3des-ctr,arcfour,arcfour128,arcfour256
>>>>>> 2013-12-02 19:49:28,608 INFO
>>>>>> org.apache.hadoop.ha.SshFenceByTcpPort.jsch: aes256-ctr is not available.
>>>>>> 2013-12-02 19:49:28,608 INFO
>>>>>> org.apache.hadoop.ha.SshFenceByTcpPort.jsch: aes192-ctr is not available.
>>>>>> 2013-12-02 19:49:28,608 INFO
>>>>>> org.apache.hadoop.ha.SshFenceByTcpPort.jsch: aes256-cbc is not available.
>>>>>> 2013-12-02 19:49:28,608 INFO
>>>>>> org.apache.hadoop.ha.SshFenceByTcpPort.jsch: aes192-cbc is not available.
>>>>>> 2013-12-02 19:49:28,609 INFO
>>>>>> org.apache.hadoop.ha.SshFenceByTcpPort.jsch: arcfour256 is not available.
>>>>>> 2013-12-02 19:49:28,610 INFO
>>>>>> org.apache.hadoop.ha.SshFenceByTcpPort.jsch: SSH_MSG_KEXINIT sent
>>>>>> 2013-12-02 19:49:28,610 INFO
>>>>>> org.apache.hadoop.ha.SshFenceByTcpPort.jsch: SSH_MSG_KEXINIT received
>>>>>> 2013-12-02 19:49:28,610 INFO
>>>>>> org.apache.hadoop.ha.SshFenceByTcpPort.jsch: kex: server->client
aes128-ctr
>>>>>> hmac-md5 none
>>>>>> 2013-12-02 19:49:28,610 INFO
>>>>>> org.apache.hadoop.ha.SshFenceByTcpPort.jsch: kex: client->server
aes128-ctr
>>>>>> hmac-md5 none
>>>>>> 2013-12-02 19:49:28,617 INFO
>>>>>> org.apache.hadoop.ha.SshFenceByTcpPort.jsch: SSH_MSG_KEXDH_INIT sent
>>>>>> 2013-12-02 19:49:28,617 INFO
>>>>>> org.apache.hadoop.ha.SshFenceByTcpPort.jsch: expecting SSH_MSG_KEXDH_REPLY
>>>>>> 2013-12-02 19:49:28,634 INFO
>>>>>> org.apache.hadoop.ha.SshFenceByTcpPort.jsch: ssh_rsa_verify: signature
true
>>>>>> 2013-12-02 19:49:28,635 WARN
>>>>>> org.apache.hadoop.ha.SshFenceByTcpPort.jsch: Permanently added 'hadoop3'
>>>>>> (RSA) to the list of known hosts.
>>>>>> 2013-12-02 19:49:28,635 INFO
>>>>>> org.apache.hadoop.ha.SshFenceByTcpPort.jsch: SSH_MSG_NEWKEYS sent
>>>>>> 2013-12-02 19:49:28,635 INFO
>>>>>> org.apache.hadoop.ha.SshFenceByTcpPort.jsch: SSH_MSG_NEWKEYS received
>>>>>> 2013-12-02 19:49:28,636 INFO
>>>>>> org.apache.hadoop.ha.SshFenceByTcpPort.jsch: SSH_MSG_SERVICE_REQUEST
sent
>>>>>> 2013-12-02 19:49:28,637 INFO
>>>>>> org.apache.hadoop.ha.SshFenceByTcpPort.jsch: SSH_MSG_SERVICE_ACCEPT
received
>>>>>> 2013-12-02 19:49:28,638 INFO
>>>>>> org.apache.hadoop.ha.SshFenceByTcpPort.jsch: Authentications that
can
>>>>>> continue: gssapi-with-mic,publickey,keyboard-interactive,password
>>>>>> 2013-12-02 19:49:28,639 INFO
>>>>>> org.apache.hadoop.ha.SshFenceByTcpPort.jsch: Next authentication
method:
>>>>>> gssapi-with-mic
>>>>>> 2013-12-02 19:49:28,642 INFO
>>>>>> org.apache.hadoop.ha.SshFenceByTcpPort.jsch: Authentications that
can
>>>>>> continue: publickey,keyboard-interactive,password
>>>>>> 2013-12-02 19:49:28,642 INFO
>>>>>> org.apache.hadoop.ha.SshFenceByTcpPort.jsch: Next authentication
method:
>>>>>> publickey
>>>>>> 2013-12-02 19:49:28,644 INFO
>>>>>> org.apache.hadoop.ha.SshFenceByTcpPort.jsch: Disconnecting from hadoop3
>>>>>> port 22
>>>>>> 2013-12-02 19:49:28,644 WARN org.apache.hadoop.ha.SshFenceByTcpPort:
>>>>>> Unable to connect to hadoop3 as user hadoop
>>>>>> com.jcraft.jsch.JSchException: Auth fail
>>>>>>     at com.jcraft.jsch.Session.connect(Session.java:452)
>>>>>>     at
>>>>>> org.apache.hadoop.ha.SshFenceByTcpPort.tryFence(SshFenceByTcpPort.java:100)
>>>>>>     at org.apache.hadoop.ha.NodeFencer.fence(NodeFencer.java:97)
>>>>>>     at
>>>>>> org.apache.hadoop.ha.ZKFailoverController.doFence(ZKFailoverController.java:521)
>>>>>>     at
>>>>>> org.apache.hadoop.ha.ZKFailoverController.fenceOldActive(ZKFailoverController.java:494)
>>>>>>     at
>>>>>> org.apache.hadoop.ha.ZKFailoverController.access$1100(ZKFailoverController.java:59)
>>>>>>     at
>>>>>> org.apache.hadoop.ha.ZKFailoverController$ElectorCallbacks.fenceOldActive(ZKFailoverController.java:837)
>>>>>>     at
>>>>>> org.apache.hadoop.ha.ActiveStandbyElector.fenceOldActive(ActiveStandbyElector.java:900)
>>>>>>     at
>>>>>> org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(ActiveStandbyElector.java:799)
>>>>>>     at
>>>>>> org.apache.hadoop.ha.ActiveStandbyElector.processResult(ActiveStandbyElector.java:415)
>>>>>>     at
>>>>>> org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:596)
>>>>>>     at
>>>>>> org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:495)
>>>>>> 2013-12-02 19:49:28,644 WARN org.apache.hadoop.ha.NodeFencer: Fencing
>>>>>> method org.apache.hadoop.ha.SshFenceByTcpPort(null) was unsuccessful.
>>>>>> 2013-12-02 19:49:28,645 ERROR org.apache.hadoop.ha.NodeFencer: Unable
>>>>>> to fence service by any configured method.
>>>>>> 2013-12-02 19:49:28,645 INFO
>>>>>> org.apache.hadoop.ha.ZKFailoverController: Local service NameNode
at
>>>>>> hadoop2/10.7.23.125:8020 entered state: SERVICE_NOT_RESPONDING
>>>>>> 2013-12-02 19:49:28,646 WARN
>>>>>> org.apache.hadoop.ha.ActiveStandbyElector: Exception handling the
winning
>>>>>> of election
>>>>>> java.lang.RuntimeException: Unable to fence NameNode at hadoop3/
>>>>>> 10.7.23.124:8020
>>>>>>     at
>>>>>> org.apache.hadoop.ha.ZKFailoverController.doFence(ZKFailoverController.java:522)
>>>>>>     at
>>>>>> org.apache.hadoop.ha.ZKFailoverController.fenceOldActive(ZKFailoverController.java:494)
>>>>>>     at
>>>>>> org.apache.hadoop.ha.ZKFailoverController.access$1100(ZKFailoverController.java:59)
>>>>>>     at
>>>>>> org.apache.hadoop.ha.ZKFailoverController$ElectorCallbacks.fenceOldActive(ZKFailoverController.java:837)
>>>>>>     at
>>>>>> org.apache.hadoop.ha.ActiveStandbyElector.fenceOldActive(ActiveStandbyElector.java:900)
>>>>>>     at
>>>>>> org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(ActiveStandbyElector.java:799)
>>>>>>     at
>>>>>> org.apache.hadoop.ha.ActiveStandbyElector.processResult(ActiveStandbyElector.java:415)
>>>>>>     at
>>>>>> org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:596)
>>>>>>     at
>>>>>> org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:495)
>>>>>> 2013-12-02 19:49:28,646 INFO
>>>>>> org.apache.hadoop.ha.ActiveStandbyElector: Trying to re-establish
ZK session
>>>>>> 2013-12-02 19:49:28,669 INFO org.apache.zookeeper.ZooKeeper: Session:
>>>>>> 0x2429313c808025b closed
>>>>>> 2013-12-02 19:49:29,672 INFO org.apache.zookeeper.ZooKeeper:
>>>>>> Initiating client connection,
>>>>>> connectString=hadoop1:2181,hadoop2:2181,hadoop3:2181 sessionTimeout=5000
>>>>>> watcher=org.apache.hadoop.ha.ActiveStandbyElector$WatcherWithClientRef@3545fe3b
>>>>>> 2013-12-02 19:49:29,675 INFO org.apache.zookeeper.ClientCnxn: Opening
>>>>>> socket connection to server hadoop3/10.7.23.124:2181. Will not
>>>>>> attempt to authenticate using SASL (Unable to locate a login configuration)
>>>>>> 2013-12-02 19:49:29,675 INFO org.apache.zookeeper.ClientCnxn: Socket
>>>>>> connection established to hadoop3/10.7.23.124:2181, initiating
>>>>>> session
>>>>>> 2013-12-02 19:49:29,699 INFO org.apache.zookeeper.ClientCnxn: Session
>>>>>> establishment complete on server hadoop3/10.7.23.124:2181, sessionid
>>>>>> = 0x3429312ba330262, negotiated timeout = 5000
>>>>>> 2013-12-02 19:49:29,702 INFO org.apache.zookeeper.ClientCnxn:
>>>>>> EventThread shut down
>>>>>> 2013-12-02 19:49:29,706 INFO
>>>>>> org.apache.hadoop.ha.ActiveStandbyElector: Session connected.
>>>>>> 2013-12-02 19:49:29,706 INFO
>>>>>> org.apache.hadoop.ha.ZKFailoverController: Quitting master election
for
>>>>>> NameNode at hadoop2/10.7.23.125:8020 and marking that fencing is
>>>>>> necessary
>>>>>> 2013-12-02 19:49:29,706 INFO
>>>>>> org.apache.hadoop.ha.ActiveStandbyElector: Yielding from election
>>>>>> 2013-12-02 19:49:29,727 INFO org.apache.zookeeper.ZooKeeper: Session:
>>>>>> 0x3429312ba330262 closed
>>>>>> 2013-12-02 19:49:29,728 WARN
>>>>>> org.apache.hadoop.ha.ActiveStandbyElector: Ignoring stale result
from old
>>>>>> client with sessionId 0x3429312ba330262
>>>>>> 2013-12-02 19:49:29,728 INFO org.apache.zookeeper.ClientCnxn:
>>>>>> EventThread shut down
>>>>>>
>>>>>
>>>>>
>>>>
>>>
>>
>

Mime
View raw message