hadoop-common-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Apache Wiki <wikidi...@apache.org>
Subject [Hadoop Wiki] Update of "TroubleShooting" by SteveLoughran
Date Tue, 15 Sep 2009 11:43:20 GMT
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Hadoop Wiki" for change notification.

The following page has been changed by SteveLoughran:
http://wiki.apache.org/hadoop/TroubleShooting

The comment on the change is:
link in the new tcp error pages, add something from the -user list

------------------------------------------------------------------------------
      at org.apache.hadoop.dfs.NameNode.createNameNode(NameNode.java:846)
      at org.apache.hadoop.dfs.NameNode.main(NameNode.java:855)}}}
  
- This is sometimes encountered if there is a corruption of the {{edits}} file 
+ This is sometimes encountered if there is a corruption of the {{{edits}}} file
  in the transaction log. Try using a hex editor or equivalent to open
  up 'edits' and get rid of the last record. In all cases, the last record
  might not be complete so your NameNode is not starting. Once you update
  your edits, start the NameNode and run {{{hadoop fsck /}}} to see if you
- have any corrupt files and fix/get rid of them. 
+ have any corrupt files and fix/get rid of them.
  
  Take a back up of {{{dfs.name.dir}}} before updating and playing around
  with it.
  
  == Client cannot talk to filesystem ==
  
+ === TCP Level Error Messages ===
+ 
+  * NoRouteToHost
+  * ConnectionRefused
+ 
  === Error message: Could not get block locations. Aborting... ===
  
- There are couple of causes for this. 
+ There are number of possible of causes for this.
   * The namenode may be overloaded. Check the logs for messages that say "discarding calls..."
   * There may not be enough (any) datanodes for the data to be written. Again, check the
logs.
-  * The datanodes on which the blocks were stored might be down. 
+  * The datanodes on which the blocks were stored might be down.
+ 
+ === Error message: Could not obtain block ===
+ 
+ Your logs contain something like
+ {{{INFO hdfs.DFSClient: Could not obtain block blk_-4157273618194597760_1160
+  from any node:  java.io.IOException: No live nodes contain current block}}}
+ 
+ There are no live datanodes containing a copy of the block of the file you are looking for.
Bring up any nodes that are down, or skip that block.
  
  == Reduce hangs ==
  
  This can be a DNS issue. Two problems which have been encountered in practice are:
-  * Machines with multiple NICs. In this case, set dfs.datanode.dns.interface (in hdfs-site.xml)
and mapred.datanode.dns.interface (in mapred-site.xml) to the name of the network interface
used by Hadoop (something like eth0 under Linux),
+  * Machines with multiple NICs. In this case, set {{{dfs.datanode.dns.interface}}} (in {{{hdfs-site.xml}}})
and {{{mapred.datanode.dns.interface}}} (in {{{mapred-site.xml}}}) to the name of the network
interface used by Hadoop (something like eth0 under Linux),
-  * Badly formatted hosts files (/etc/hosts under Linux) can wreak havoc. Any DNS problem
will hobble Hadoop, so ensure that names can be resolved correctly.
+  * Badly formatted or incorrect hosts files ({{{/etc/hosts}}} under Linux) can wreak havoc.
Any DNS problem will hobble Hadoop, so ensure that names can be resolved correctly.
  

Mime
View raw message