hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ted Dunning (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-1623) High Availability Framework for HDFS NN
Date Mon, 25 Jul 2011 17:02:14 GMT

    [ https://issues.apache.org/jira/browse/HDFS-1623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13070611#comment-13070611

Ted Dunning commented on HDFS-1623:

I guess it helps to quantify how "small" this window needs to be and whether current ZK is
able to provide the notification fast enough. if not, maybe implementing the ZAB protocol
as part of the namenode/backup-nodenode communication ? (in that case it would be nice if
ZK exports its protocol as a library)

The problem isn't on the Zookeeper side, really.  

The problem is that you have to be able to detect failures quickly, but not capriciously.
 Pretty much the only valid way to do this is with heartbeats of some kind which means that
the time between heartbeats is the shortest response time possible.  Moreover, the monitored
program has to be sure that it can meet the real-time constraint imposed by the heartbeat
rate.  With the NameNode and possible GC, this is really, really hard to do for short heartbeats

There is perhaps some mileage to be had to synchronizing any fencing with the heartbeat so
that you at least get rid of that uncertainty.

Overall, this is a hard problem which is the reason for protocols like two-phase commit.

> High Availability Framework for HDFS NN
> ---------------------------------------
>                 Key: HDFS-1623
>                 URL: https://issues.apache.org/jira/browse/HDFS-1623
>             Project: Hadoop HDFS
>          Issue Type: New Feature
>            Reporter: Sanjay Radia
>            Assignee: Sanjay Radia
>         Attachments: HDFS-High-Availability.pdf, NameNode HA_v2.pdf, NameNode HA_v2_1.pdf,
Namenode HA Framework.pdf

This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


View raw message