hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Todd Lipcon (Updated) (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (HDFS-3081) SshFenceByTcpPort uses netcat incorrectly
Date Tue, 20 Mar 2012 04:43:48 GMT

     [ https://issues.apache.org/jira/browse/HDFS-3081?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Todd Lipcon updated HDFS-3081:
------------------------------

    Attachment: hdfs-3081.txt

Attached patch fixes the problem.

I am still using nc to verify that it's down, since it's possible that, if the user is wrong,
then fuser won't be able to find the listening process. (it has to be either the same user
or root).

I tested locally by using my external hostname and verifying the following in the logs:

12/03/19 21:40:19 INFO ha.SshFenceByTcpPort: Connected to todd-w510
12/03/19 21:40:19 INFO ha.SshFenceByTcpPort: Looking for process running on port 8020
12/03/19 21:40:19 DEBUG ha.SshFenceByTcpPort: Running cmd: PATH=$PATH:/sbin:/usr/sbin fuser
-v -k -n tcp 8020
12/03/19 21:40:19 INFO ha.SshFenceByTcpPort: Indeterminate response from trying to kill service.
Verifying whether it is running using nc...
12/03/19 21:40:19 DEBUG ha.SshFenceByTcpPort: Running cmd: nc -z todd-w510 8020
12/03/19 21:40:19 INFO ha.SshFenceByTcpPort: Verified that the service is down.

                
> SshFenceByTcpPort uses netcat incorrectly
> -----------------------------------------
>
>                 Key: HDFS-3081
>                 URL: https://issues.apache.org/jira/browse/HDFS-3081
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: ha
>    Affects Versions: 0.24.0
>            Reporter: Philip Zeyliger
>            Assignee: Todd Lipcon
>         Attachments: hdfs-3081.txt
>
>
> SshFencyByTcpPort currently assumes that the NN is listening on localhost.  Typical setups
have the namenode listening just on the hostname of the namenode, which would lead "nc -z"
to not catch it.
> Here's an example in which the NN is running, listening on 8020, but doesn't respond
to "localhost 8020".
> {noformat}
> [root@xxx ~]# lsof -P -p 5286 | grep -i listen
> java    5286 root  110u  IPv4            1772357              TCP xxx:8020 (LISTEN)
> java    5286 root  121u  IPv4            1772397              TCP xxx:50070 (LISTEN)
> [root@xxx ~]# nc -z localhost 8020
> [root@xxx ~]# nc -z xxx 8020
> Connection to xxx 8020 port [tcp/intu-ec-svcdisc] succeeded!
> {noformat}
> Here's the likely offending code:
> {code}
>         LOG.info(
>             "Indeterminate response from trying to kill service. " +
>             "Verifying whether it is running using nc...");
>         rc = execCommand(session, "nc -z localhost 8020");
> {code}
> Naively, we could rely on netcat to the correct hostname (since the NN ought to be listening
on the hostname it's configured as), or just to use fuser.  Fuser catches ports independently
of what IPs they're bound to:
> {noformat}
> [root@xxx ~]# fuser 1234/tcp
> 1234/tcp:             6766  6768
> [root@xxx ~]# jobs
> [1]-  Running                 nc -l localhost 1234 &
> [2]+  Running                 nc -l rhel56-18.ent.cloudera.com 1234 &
> [root@xxx ~]# sudo lsof -P | grep -i LISTEN | grep -i 1234
> nc         6766      root    3u     IPv4            2563626                 TCP localhost:1234
(LISTEN)
> nc         6768      root    3u     IPv4            2563671                 TCP xxx:1234
(LISTEN)
> {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Mime
View raw message