hadoop-common-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Apache Wiki <wikidi...@apache.org>
Subject [Hadoop Wiki] Update of "NameNodeFailover" by KonstantinShvachko
Date Mon, 11 Aug 2008 17:30:39 GMT
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Hadoop Wiki" for change notification.

The following page has been changed by KonstantinShvachko:

The comment on the change is:

- The name node is a critical resource for the cluster because data nodes don't know enough
about the blocks that they contain to coherently answer requests for anything but the block
contents.  This isn't generally a serious problem because single machines are typically fairly
reliable (it is only with a large cluster that we expect daily or hourly failures).
+ deleted
- That said, there is a secondary name node that talks to the primary name node on a regular
basis in order to keep track of the files in the system.  It does this by copying the fsimage
and editlog files from the primary name node.
- If the name node dies, the simplest procedure is to simply use DNS to rename the primary
and secondary name nodes.  The secondary name node will serve as primary name node as long
as nodes request meta-data from it.  Once you get your old primary back up, you should reconfigure
it to be the secondary name node and you will be back in full operation.
- Note that the secondary name node only copies information every few minutes.  For a more
up-to-date recovery, you can make the name node log transactions to multiple directories,
including one networked mounted one.  You can then copy the fsimage and fsedit files from
that networked directory and have a recovery that is up to the second.
- Questions I still have include:
-  * what do you have to do to the old primary to make it be a secondary?
-  * can you have more than one secondary name node (for off-site backup purposes)?
-  * are there plans for distributing the name node function?  
- === Answer ===
- Secondary Namenode does not have function to be a failover mechanism.  It is a helping process
to the namenode.  It is not of help if the namenode fails.  The name is possibly misleading.
- In order to provide redundancy for data protection in case of namenode failure the best
way is to store the namenode metadata on a different machine.  Hadoop has an option to have
multiple namenode directories and the recommended option is to have one of the namenode directories
on an NFS share.  However you have to make sure the NFS locking will not cause problems and
it is NOT recommended to change this on a live system because it can corrupt namenode data.
 Another option is to simply copy namenode metadata to another machine.
- --Ankur Sethi
- '''Question'''
- Why not keep the fsimage and editlog in the DFS (somehow that they could be located by data
nodes without the name node)?
- Then when then name node fails, by an election mechanism, a data node becomes the new name
- --Cosmin Lehene

View raw message