hadoop-hdfs-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Allen Wittenauer <awittena...@linkedin.com>
Subject Re: Hadoop Failover and Recovery
Date Mon, 31 Aug 2009 17:01:36 GMT
On 8/28/09 8:58 PM, "sagar_shukla" <sagar_shukla@persistent.co.in> wrote:
>      What are the failover and recovery mechanisms available for Hadoop ? I
> searched over the internet but could not find any good documentation for
> different scenarios like datanode going down or namenode going down.

In most cases, the documentation for "fixing" Hadoop is:

A) fix hardware
B) clean out tmp files, etc
C) restart processes for that node

Name node is a bit of a special case. I'm amused that
http://wiki.apache.org/hadoop/NameNodeFailover is empty. :)

For name node, you have some preventative things to do first:

A) have matching hardware available
B) make sure you have fsimage and edits file writing or at least available
to that machine via NFS, SMB, whatever it takes

On failure, use that backup image to bring the name node backup on your
spare box.

Note that the NN isn't HA.  I suspect something like SunCluster or VCS could
be used here to make it less susceptible to issues, but I don't know if
anyone has tried it.

View raw message