hadoop-common-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Apache Wiki <wikidi...@apache.org>
Subject [Hadoop Wiki] Update of "ServerNotAvailable" by SteveLoughran
Date Thu, 30 Jun 2011 10:50:40 GMT
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Hadoop Wiki" for change notification.

The "ServerNotAvailable" page has been changed by SteveLoughran:
http://wiki.apache.org/hadoop/ServerNotAvailable

Comment:
new page on understanding the server not available error

New page:
= Server Not Available Yet =

This can appear in the logs of a DataNode

{{{
2011-06-30 11:30:40,403 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: namenode/10.8.1.2:54310.
Already tried 0 time(s).
2011-06-30 11:30:41,404 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: namenode/10.8.1.2:54310.
Already tried 1 time(s).
2011-06-30 11:30:42,404 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: namenode/10.8.1.2:54310.
Already tried 2 time(s).
2011-06-30 11:30:43,405 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: namenode/10.8.1.2:54310.
Already tried 3 time(s).
2011-06-30 11:30:44,405 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: namenode/10.8.1.2:54310.
Already tried 4 time(s).
2011-06-30 11:30:45,406 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: namenode/10.8.1.2:54310.
Already tried 5 time(s).
2011-06-30 11:30:46,407 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: namenode/10.8.1.2:54310.
Already tried 6 time(s).
2011-06-30 11:30:47,407 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: namenode/10.8.1.2:54310.
Already tried 7 time(s).
2011-06-30 11:30:48,408 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: namenode/10.8.1.2:54310.
Already tried 8 time(s).
2011-06-30 11:30:49,409 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: namenode/10.8.1.2:54310.
Already tried 9 time(s).
2011-06-30 11:30:49,410 INFO org.apache.hadoop.ipc.RPC: Server at namenode/10.8.1.2:54310
not available yet, Zzzzz...
2011-06-30 11:30:51,411 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: namenode/10.8.1.2:54310.
Already tried 0 time(s).
2011-06-30 11:30:52,412 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: namenode/10.8.1.2:54310.
Already tried 1 time(s).
2011-06-30 11:30:53,412 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: namenode/10.8.1.2:54310.
Already tried 2 time(s).
2011-06-30 11:30:54,413 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: namenode/10.8.1.2:54310.
Already tried 3 time(s).
2011-06-30 11:30:55,414 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: namenode/10.8.1.2:54310.
Already tried 4 time(s).
2011-06-30 11:30:56,414 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: namenode/10.8.1.2:54310.
Already tried 5 time(s).
2011-06-30 11:30:57,415 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: namenode/10.8.1.2:54310.
Already tried 6 time(s).
2011-06-30 11:30:58,416 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: namenode/10.8.1.2:54310.
Already tried 7 time(s).
2011-06-30 11:30:59,416 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: namenode/10.8.1.2:54310.
Already tried 8 time(s).
2011-06-30 11:31:00,417 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: namenode/10.8.1.2:54310.
Already tried 9 time(s).
2011-06-30 11:31:00,418 INFO org.apache.hadoop.ipc.RPC: Server at namenode/10.8.1.2:54310
not available yet, Zzzzz...
}}}

What's happening here is that the DataNode cannot connect to the NameNode. Rather than fail,
it assumes that the NameNode is temporarily offline -it hasn't started or is being restarted.
The DataNodes will happily wait for the NameNode to come back up, and as soon as it does,
report in. After trying repeatedly every seconds the client will back off for couple of seconds,
then try again.


This process of retrying and backing off is a key part of how an HDFS cluster handles the
temporary outage of a NameNode. It works well provided the network is set up and running correctly.
It can be triggered by other cluster setup problems, which anyone setting up a Hadoop cluster
is likely to encounter.

 1. The namenode hasn't been started yet. Fix: start the NameNode.
 2. The `fs.default.name` property in `core-site.xml` doesn't point to the correct hostname
for the NameNode, and the DataNodes are trying to connect to the wrong server. Look at the
server name in the log and verify it is valid.
 3. The port in the `fs.default.name` property is wrong. Verify the NameNode is listening
at that port; if not correct the site settings.
 4. The client can't resolve the hostname, or it is resolving to the wrong address. Verify
that IP address in the logs.
 5. Connection problems. Look at the network connectivity options in the TroubleShooting page.

Mime
View raw message