hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Aaron T. Myers (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-1973) HA: HDFS clients must handle namenode failover and switch over to the new active namenode.
Date Thu, 09 Jun 2011 19:30:59 GMT

    [ https://issues.apache.org/jira/browse/HDFS-1973?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13046767#comment-13046767

Aaron T. Myers commented on HDFS-1973:

Hi Hari,

bq. Can you please elaborate a little bit on your area of interest with ZOOKEEPER-1080?

As noted in Sanjay's design doc, one proposal for detecting NN failure would be to use an
external ZK service. The HDFS proposal doesn't go into great detail on this, but it suggests
using ZK with a heartbeat mechanism to see if the NN is still alive. I personally like the
ZK recipe better (i.e. using ephemeral + sequence nodes).

Another possible use for ZK in the implementation of NN HA would be to use ZK as the source
of truth for clients to determine the active NN. This would seem to flow naturally from the
part of the ZK recipe which says "Applications may consider creating a separate to znode to
acknowledge that the leader has executed the leader procedure." If NN HA were to utilize an
implementation of the ZK leader election recipe, then perhaps this "leader-procedure-complete
znode" could store the IP or hostname of the active NN which clients could use.

I haven't read the design doc posted on ZOOKEEPER-1080 yet. I'll go ahead and do that and
post my comments there.

I should also mention that we have not settled upon what strategy we'll take to do NN failure
detection or client failover. As noted in Sanjay's design doc, we're also strongly considering
using virtual IPs for client failover.

> HA: HDFS clients must handle namenode failover and switch over to the new active namenode.
> ------------------------------------------------------------------------------------------
>                 Key: HDFS-1973
>                 URL: https://issues.apache.org/jira/browse/HDFS-1973
>             Project: Hadoop HDFS
>          Issue Type: Sub-task
>            Reporter: Suresh Srinivas
>            Assignee: Aaron T. Myers
> During failover, a client must detect the current active namenode failure and switch
over to the new active namenode. The switch over might make use of IP failover or some thing
more elaborate such as zookeeper to discover the new active.

This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

View raw message