hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Konstantin Shvachko <...@yahoo-inc.com>
Subject Re: Namenode cluster and fail over
Date Fri, 07 Mar 2008 19:45:07 GMT
> We are evaluating a plan to migrate NetApp NAS 400 TB storage system to
> Hadoop file system.
> One of crucial requirement for us is high availability and reliability of
> storage system.
> By reading Hadoop architecture and design doc, In case of Namenode failure,
> it needs a manually recovery from Secondar NameNode. Is that still the case?

Manual recovery from the Secondary node is the last resort if everything else failed.
The Namenode can be configured to save the image and the change logs into multiple
storage directories. We usually configure them to be on different hard drives on the
same machine or mounted via nfs.
So even if the whole machine fails you have a copy of the image that can be used to
start name-node on a new machine.
So you use the Secondary's node copy only if all other copies are unavailable.

> Any plan to develop full replication of Namenode to SecondayNameNode and
> support real time fail over to SeondaryNameNode in case of Namenode failure
> ?

Automatic recovery from the secondary node image is one of our primary plans.
Should be done pretty soon.
High availability is also a high priority, but is not going to be done tomorrow.
For now you can use some scripting solutions outside of hadoop. Like, running
a daemon that pings your name-node once in while; shuts down and restarts the
cluster if something goes wrong.
Hope this helps.


View raw message