hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Suresh Srinivas (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HDFS-1623) High Availability Framework for HDFS NN
Date Fri, 18 Mar 2011 20:08:29 GMT

    [ https://issues.apache.org/jira/browse/HDFS-1623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13008607#comment-13008607

Suresh Srinivas commented on HDFS-1623:

> with the only difference being whether an active switch is enabled.
Yes. Active/standby will be the state of the namenode and code should be identical.

> It would be good if the code for active/standby detection was pluggable. So that different
options for failover could be provided. It wouldn't be good to require that a zookeeper ensemble
be set up just to run a namenode.
The document does not state zookeeper is needed. HA solution requires a quorum service, that
should be pluggable. ZK is one of the options.

> How does heartbeat deal with network partition? My understanding of it is that it sends
packets at intervals to the other node, and if they don't get through it considers the other
dead. This could create a situation where both active and standby think that the other is
dead, and both become active, leading to divergent filesystem states on each machine.
This is discussed in the document as split brain and fencing requirements right?

> Also, the design indicates that more than 2 NN is out of scope. Why? Surely it's as easy
to design for N namenodes as it is for 2 namenodes.
Why do you need more than 2 NNs? Having more than 2NNs could solve need for outside quorum
service. But number of NNs could be huge, especially in federated clusters.

> If you want manual failover, from the server perspective you need to do nothing. Operators
can have 2 namenode machines, with the namenode only running on one, writing to shared storage.
When the want to failover to the standby they just have to ensure that the active is down
and start the namenode daemon on the standby.
Not sure what you are getting at here, in reference to the attached document?

> I proposed a design last week for streaming updates from an active to a standby, it may
be interesting to you (ZOOKEEPER-1016). It does have some mentions of active/standby detection,
which I should remove. It occurs to me now, that this functionallity should be separated out
completely from the WALing and should live at the level of NameNode.java.
I do not understand this point. Will take a look at your proposal. But as regards to this
jira, BookKeeper could be a component in the solution and not the only component.

> High Availability Framework for HDFS NN
> ---------------------------------------
>                 Key: HDFS-1623
>                 URL: https://issues.apache.org/jira/browse/HDFS-1623
>             Project: Hadoop HDFS
>          Issue Type: New Feature
>            Reporter: Sanjay Radia
>            Assignee: Sanjay Radia
>         Attachments: Namenode HA Framework.pdf

This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

View raw message