hadoop-common-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Apache Wiki <wikidi...@apache.org>
Subject [Hadoop Wiki] Update of "NameNodeFailover" by SomeOtherAccount
Date Fri, 17 Sep 2010 15:57:51 GMT
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Hadoop Wiki" for change notification.

The "NameNodeFailover" page has been changed by SomeOtherAccount.


- The name node is a critical resource for the cluster because data nodes don't know enough
about the blocks that they contain to coherently answer requests for anything but the block
contents.  This isn't generally a serious problem because single machines are typically fairly
reliable (it is only with a large cluster that we expect daily or hourly failures).
+ === Introduction ===
+ As of 0.20, Hadoop does not support automatic recovery in the case of a NameNode failure.
 This is a well known and recognized single point of failure in Hadoop.  The good news is
that as you add more nodes to a grid, the statistical probability that the NameNode will fail
goes down.
- That said, there is a secondary name node that talks to the primary name node on a regular
basis in order to keep track of the files in the system.  It does this by copying the fsimage
and editlog files from the primary name node.
+ Experience at Yahoo! shows that NameNodes are more likely to fail due to misconfiguration,
network issues, and bad behavior amongst clients than actual hardware problems.  Out of fifteen
grids over three year period, only three NameNode failures were related to hardware problems.
- If the name node dies, the simplest procedure is to simply use DNS to rename the primary
and secondary name nodes.  The secondary name node will serve as primary name node as long
as nodes request meta-data from it.  Once you get your old primary back up, you should reconfigure
it to be the secondary name node and you will be back in full operation.
+ === Configuring Hadoop for Failover ===
+ There are some preliminary steps that must be in place prior to performing a NameNode recovery.
 The most important is the dfs.name.dir property. This setting configures the NameNode such
that it can write to more than one directory.  A typcal configuration might look something
like this:
- Note that the secondary name node only copies information every few minutes.  For a more
up-to-date recovery, you can make the name node log transactions to multiple directories,
including one networked mounted one.  You can then copy the fsimage and fsedit files from
that networked directory and have a recovery that is up to the second.
+   <property>
+      <name>dfs.name.dir</name>
+      <value>/export/hadoop/namedir,/remote/export/hadoop/namedir</value>
+    </property>
- Questions I still have include:
+ The first directory is a local directory and the second directory is a NFS mounted directory.
 The NameNode will write to both locations, keeping the HDFS metadata in sync.  This allows
for storage of the metadata off-machine so that one will have something to recover. During
startup, the NameNode will pick the most recent version of these two directories to use and
then sync both of them to use the same data.
-  * what do you have to do to the old primary to make it be a secondary?
+ After we have configured the NameNode to write to two or more directories, we now have a
working backup of the metadata.  Using this data, in the more common failure scenarios, we
can use this data to bring the dead NameNode from the grave.
-  * can you have more than one secondary name node (for off-site backup purposes)?
+ '''When a Failure Occurs'''
-  * are there plans for distributing the name node function?  
+ Now the recovery steps:
- === Answer ===
- Secondary Namenode does not have function to be a failover mechanism.  It is a helping process
to the namenode.  It is not of help if the namenode fails.  The name is possibly misleading.
+  1. Just to be safe, make a copy of the data on the remote NFS mount for safe keeping.
+  1. Pick a target machine on the same network.
+  1. Change the IP address of that machine to match the NameNode's IP address.  Using an
interface alias to provide this address movement works as well.  If this is not an option,
be prepared to restart the entire grid to avoid hitting https://issues.apache.org/jira/browse/HADOOP-3988
+  1. Install Hadoop similarly to how you did the NameNode
+  1. Do '''not''' format this node!
+  1. Mount the remote NFS directory in the same location.
+  1. Startup the NameNode.
+  1. The NameNode should start replaying the edits file, updating the image, block reports
should come in, etc.
+ At this point, your NameNode should be up.
- In order to provide redundancy for data protection in case of namenode failure the best
way is to store the namenode metadata on a different machine.  Hadoop has an option to have
multiple namenode directories and the recommended option is to have one of the namenode directories
on an NFS share.  However you have to make sure the NFS locking will not cause problems and
it is NOT recommended to change this on a live system because it can corrupt namenode data.
 Another option is to simply copy namenode metadata to another machine.
- --Ankur Sethi
- '''Question'''
+ '''Other Ideas'''
+ There are some other ideas to help with NameNode recovery:
- Why not keep the fsimage and editlog in the DFS (somehow that they could be located by data
nodes without the name node)?
- Then when then name node fails, by an election mechanism, a data node becomes the new name
- --Cosmin Lehene
+  1. Keep in mind that the SecondaryNameNode and/or the CheckpointNode also has an older
copy of the NameNode metadata.  If you haven't done the preliminary work above, you might
still be able to recover using the data on those systems.  Just note that it will only be
as fresh as the last run and you will likely experience some data loss.
+  1. Instead of using NFS on Linux, it may be worth while looking into DRBD.  A few sites
are using this with great success.

View raw message