hadoop-common-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Apache Wiki <wikidi...@apache.org>
Subject [Hadoop Wiki] Update of "UnknownHost" by ArpitAgarwal
Date Tue, 19 Jan 2016 23:54:04 GMT
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Hadoop Wiki" for change notification.

The "UnknownHost" page has been changed by ArpitAgarwal:
https://wiki.apache.org/hadoop/UnknownHost?action=diff&rev1=7&rev2=8

Comment:
Remove mentions of the deprecated hadoop-site.xml. Reorder possible causes, separate out less
likely causes.

  You get an Unknown Host Error -often wrapped in a Java {{{IOException}}}, when one machine
on the network cannot determine the IP address of a host that it is trying to connect to by
way of its hostname. This can happen during file upload (in which case the client machine
is has the hostname problem), or inside the Hadoop cluster.
  
  
- Some possible causes (not an exclusive list):
+ Some possible causes in approximately reverse order of likelihood (not an exclusive list):
+  1. DNS or hosts table misconfiguration:
+    a. You are using DNS but the site's DNS server does not have an entry for the node. '''Test:'''
do an {{{nslookup <hostname>}}} from the client machine.
+    a. You are using {{{/etc/hosts}}} entries but the calling machine's hosts file lacks
an entry for the host. FQDN entries in {{{/etc/hosts}}} files must contain a trailing dot.
See the [[#UsingHostsFiles|Using Hosts files]] section below. '''Test:''' do a {{{ping <hostname>.}}}
from the client machine (note trailing 'dot').
-  * The hostname in the configuration files (such as {{{hadoop-site.xml}}}) is misspelled.
+    a. The hostname in the configuration files (such as {{{core-site.xml}}}) is misspelled.
-  * The hostname in the configuration files (such as {{{hadoop-site.xml}}}) is confused with
the hostname of another service. For example, you are using the hostname of the YARN Resource
Manager in the {{{fs.defaultFS}}} configuration option to define the namenode.
+  1. The hostname in the configuration files (such as {{{core-site.xml}}}) is confused with
the hostname of another service. For example, you are using the hostname of the YARN Resource
Manager in the {{{fs.defaultFS}}} configuration option to define the namenode.
-  * The site's DNS server does not have an entry for the node. Test: do an {{{nslookup <hostname>}}}
from the client machine.
-  * The calling machine's host table {{{/etc/hosts}}} lacks an entry for the host, and DNS
isn't helping out. When using {{{/etc/hosts}}} based lookups the hostname entry must contain
a trailing dot. See the [[#UsingHostsFiles|Using Hosts files]] section below for more.
-  * A worker node thinks it has a given name -which it reports to the NameNode and JobTracker,
but that isn't the name that the network team gave it, so it isn't resolvable.
+  1. A worker node thinks it has a given name which it reports to the NameNode and JobTracker,
but that isn't the name that the network team gave it, so it isn't resolvable.
-  * The calling machine is on a different subnet from the target machine, and short names
are being used instead of fully qualified domain names (FQDNs).
+  1. The calling machine is on a different subnet from the target machine, and short names
are being used instead of fully qualified domain names (FQDNs).
+ 
+ Less likely causes:
-  * The client's network card is playing up (network timeouts, etc), the network is overloaded,
or even the switch is dropping DNS packets.
+  1. The client's network card is playing up (network timeouts, etc), the network is overloaded,
or even the switch is dropping DNS packets.
-  * The host's IP address has changed but a long-lived JVM is caching the old value. This
is a known problem with JVMs (search for "java negative DNS caching" for the details and solutions).
The quick solution: restart the JVMs
+  1. The host's IP address has changed but a long-lived JVM is caching the old value. This
is a known problem with JVMs (search for "java negative DNS caching" for the details and solutions).
The quick solution: restart the JVMs
-  * The site's DNS server is overloaded. This can happen in large clusters. Either move to
host table entries or use caching DNS servers in every worker node.
+  1. The site's DNS server is overloaded. This can happen in large clusters. Either move
to host table entries or use caching DNS servers in every worker node.
-  * Your ARP cache is corrupt, either accidentally or maliciously. If you don't know what
that means, you won't be in a position to verify this is the problem -or fix it.
+  1. Your ARP cache is corrupt, either accidentally or maliciously. If you don't know what
that means, you won't be in a position to verify this is the problem -or fix it.
  
  These are all network configuration/router issues. As it is your network, only you can find
out and track down the problem. That said, any tooling to help Hadoop track down such problems
in cluster would be welcome, as would extra diagnostics. If you have to extend Hadoop to track
down these issues -submit your patches!
  
@@ -36, +39 @@

  {{{
  1.2.3.4  foo.example.com foo.example.com. foo
  }}}
- The Hadoop host resolver ensures hostnames are terminated with a trailing {{{.}}} prior
to lookup to avoid the security issue described in [[https://ietf.org/rfc/rfc1535.txt|RFC
1535]].
+ The Hadoop host resolver ensures hostnames are terminated with a trailing dot prior to lookup
to avoid the security issue described in [[https://ietf.org/rfc/rfc1535.txt|RFC 1535]].
  
  == Unknown Host Exception in HA HDFS ==
  

Mime
View raw message