hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Konstantin Shvachko <...@yahoo-inc.com>
Subject Re: Namenode cluster and fail over
Date Mon, 10 Mar 2008 20:47:14 GMT


mickey hsieh wrote:
> Thank for elaborate the details.
> 
> 
>>Automatic recovery from the secondary node image is one of our primary
> 
> plans.
> 
>>Should be done pretty soon.
> 
> 
> Could you point to details?

Here is the link.
http://issues.apache.org/jira/browse/HADOOP-2585

>>>High availability is also a high priority, but is not going to be done
> 
> tomorrow.
> 
> Any road map?
> Is there any plan for client side (keep primary and backup server)
> automatically fail over to backup server in case of primary server fails ?

Yes this looks simple from the client point of view, but the two servers should be
kept in synch and there are different ways to do that.
I don't have any delivarables, sorry.


> 
> 
> On 3/7/08, Konstantin Shvachko <shv@yahoo-inc.com> wrote:
> 
>>>We are evaluating a plan to migrate NetApp NAS 400 TB storage system to
>>>Hadoop file system.
>>>
>>>One of crucial requirement for us is high availability and reliability
>>
>>of
>>
>>>storage system.
>>>
>>>By reading Hadoop architecture and design doc, In case of Namenode
>>
>>failure,
>>
>>>it needs a manually recovery from Secondar NameNode. Is that still the
>>
>>case?
>>
>>
>>Manual recovery from the Secondary node is the last resort if everything
>>else failed.
>>The Namenode can be configured to save the image and the change logs into
>>multiple
>>storage directories. We usually configure them to be on different hard
>>drives on the
>>same machine or mounted via nfs.
>>So even if the whole machine fails you have a copy of the image that can
>>be used to
>>start name-node on a new machine.
>>So you use the Secondary's node copy only if all other copies are
>>unavailable.
>>
>>
>>
>>>Any plan to develop full replication of Namenode to SecondayNameNode and
>>>support real time fail over to SeondaryNameNode in case of Namenode
>>
>>failure
>>
>>>?
>>
>>
>>Automatic recovery from the secondary node image is one of our primary
>>plans.
>>Should be done pretty soon.
>>High availability is also a high priority, but is not going to be done
>>tomorrow.
>>For now you can use some scripting solutions outside of hadoop. Like,
>>running
>>a daemon that pings your name-node once in while; shuts down and restarts
>>the
>>cluster if something goes wrong.
>>Hope this helps.
>>
>>
>>--Konstantin
>>
> 
> 

Mime
View raw message