hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Arpit Gupta (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-6089) Standby NN while transitioning to active throws a connection refused error when the prior active NN process is suspended
Date Tue, 11 Mar 2014 21:06:46 GMT

    [ https://issues.apache.org/jira/browse/HDFS-6089?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13930954#comment-13930954
] 

Arpit Gupta commented on HDFS-6089:
-----------------------------------

Here is the console log

{code}
sudo su - -c "/usr/bin/hdfs haadmin -getServiceState nn1" hdfs
active
exit code = 0
sudo su - -c "/usr/bin/hdfs haadmin -getServiceState nn2" hdfs
standby
exit code = 0
ssh -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null hostname "sudo su - -c \"cat
/grid/0/var/run/hadoop/hdfs/hadoop-hdfs-namenode.pid | xargs kill -19\" hdfs"
sudo su - -c "/usr/bin/hdfs haadmin -getServiceState nn1" hdfs
Operation failed: Call From host1/ip to host1:8020 failed on socket timeout exception: java.net.SocketTimeoutException:
20000 millis timeout while waiting for channel to be ready for read. ch : java.nio.channels.SocketChannel[connected
local=host1/ip:35192 remote=host1/ip:8020]; For more details see:  http://wiki.apache.org/hadoop/SocketTimeout
exit code = 255
sudo su - -c "/usr/bin/hdfs haadmin -getServiceState nn2" hdfs
Operation failed: Call From host2/ip to host2:8020 failed on socket timeout exception: java.net.SocketTimeoutException:
20000 millis timeout while waiting for channel to be ready for read. ch : java.nio.channels.SocketChannel[connected
local=host2/ip:37640 remote=host2/68.142.247.217:8020]; For more details see:  http://wiki.apache.org/hadoop/SocketTimeout
exit code = 255
{code}

> Standby NN while transitioning to active throws a connection refused error when the prior
active NN process is suspended
> ------------------------------------------------------------------------------------------------------------------------
>
>                 Key: HDFS-6089
>                 URL: https://issues.apache.org/jira/browse/HDFS-6089
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: ha
>    Affects Versions: 2.4.0
>            Reporter: Arpit Gupta
>            Assignee: Jing Zhao
>
> The following scenario was tested:
> * Determine Active NN and suspend the process (kill -19)
> * Wait about 60s to let the standby transition to active
> * Get the service state for nn1 and nn2 and make sure nn2 has transitioned to active.
> What was noticed that some times the call to get the service state of nn2 got a socket
time out connection.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Mime
View raw message