hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "mickey hsieh" <mickeyhs...@gmail.com>
Subject Re: Namenode cluster and fail over
Date Fri, 07 Mar 2008 20:01:47 GMT
Thank for elaborate the details.

>Automatic recovery from the secondary node image is one of our primary
plans.
>Should be done pretty soon.

Could you point to details?

>>High availability is also a high priority, but is not going to be done
tomorrow.

Any road map?
Is there any plan for client side (keep primary and backup server)
automatically fail over to backup server in case of primary server fails ?


Mickey


On 3/7/08, Konstantin Shvachko <shv@yahoo-inc.com> wrote:
>
> > We are evaluating a plan to migrate NetApp NAS 400 TB storage system to
> > Hadoop file system.
> >
> > One of crucial requirement for us is high availability and reliability
> of
> > storage system.
> >
> > By reading Hadoop architecture and design doc, In case of Namenode
> failure,
> > it needs a manually recovery from Secondar NameNode. Is that still the
> case?
>
>
> Manual recovery from the Secondary node is the last resort if everything
> else failed.
> The Namenode can be configured to save the image and the change logs into
> multiple
> storage directories. We usually configure them to be on different hard
> drives on the
> same machine or mounted via nfs.
> So even if the whole machine fails you have a copy of the image that can
> be used to
> start name-node on a new machine.
> So you use the Secondary's node copy only if all other copies are
> unavailable.
>
>
> > Any plan to develop full replication of Namenode to SecondayNameNode and
> > support real time fail over to SeondaryNameNode in case of Namenode
> failure
> > ?
>
>
> Automatic recovery from the secondary node image is one of our primary
> plans.
> Should be done pretty soon.
> High availability is also a high priority, but is not going to be done
> tomorrow.
> For now you can use some scripting solutions outside of hadoop. Like,
> running
> a daemon that pings your name-node once in while; shuts down and restarts
> the
> cluster if something goes wrong.
> Hope this helps.
>
>
> --Konstantin
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message