hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Todd Lipcon (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-2179) HA: namenode fencing mechanism
Date Wed, 20 Jul 2011 23:09:58 GMT

    [ https://issues.apache.org/jira/browse/HDFS-2179?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13068700#comment-13068700
] 

Todd Lipcon commented on HDFS-2179:
-----------------------------------


h3. Fencing overview

In order to fence a NN, there are several different methods, at varying levels of nastiness:

1) Cooperative active->standby transition or shutdown
In the case of a manual failover, the old primary can gracefully either transition to a standby
mode, or gracefully shut down. In this case, since we assume the software to be cooperative,
no real "fencing" is necessary -- the new NN just needs to unambiguously confirm that the
old NN has dropped out of active mode.

This method succeeds only if the old NN remains in full operation.

2) Process killing or death verification (eg via ssh or a second daemon)
In the case that the old primary has either hung (eg deadlock) or crashed (eg JVM segfault),
but the host is OK, the new primary may contact that host and send SIGKILL to the NameNode
JVM. This may be done either via ssh or via contacting some process which is still running
on the node. It is also sufficient to _verify_ that the NN process is no longer running in
the case that its JVM crashed.

This method succeeds only if the _host_ of the old NN remains in full operation, despite the
NN itself being deadlocked or crashed.

3) Storage fencing
Depending on the type of storage in which the old NN stores its edits directories, the new
NN may explicitly fence the storage. This is typically accomplished using a vendor-specific
extension. For example, NetApp filers support the command "exportfs -b enable save <nnhost.com>
/vol/vol0" which can be remotely issued in order to disallow any further access to a particular
mount by a particular host.

In the case of edits stored on BookKeeper in the future, we may be able to implement some
kind of lease revocation or fencing within that storage system.

4) Network port fencing
Many switches support remote management. One way to prevent a NameNode from responding to
any further requests is to forcibly disable its network port. An alternative similar mechanism
is to use something like a LOM card to remotely disable the NIC.

5) Power port fencing (aka STONITH)
Many power distribution units (PDUs) support remote management. The last ditch effort to fence
a node is to literally "pull the power"


h3. Proposal

Since methods 3-5 above are usually vendor-specific implementations, it does not make sense
to try to implement a catch-all fencing mechanism within Hadoop. Instead, operators are likely
to want to use commonly available shell scripts that work against their preferred hardware.
Given this, I would propose that Hadoop's fencing behavior be:

- Configure a list of "fence methods", each with an associated priority.
- Each fence method returns an exit code indicating whether it has successfully fenced the
target node.
- If any method succeeds, no further method is attempted.
- If a method fails, continue down the list to try the next method.
- If all fence methods fail, then both nodes remain in "standby" state, and an administrator
must manually force the transition after verifying that the other node is no longer active.

The first fence method will always be the "cooperative" method. We can also ship with Hadoop
an implementation of method #2 (shoot-the-other-process-in-the-head via ssh). Methods 3-5
would probably be fulfilled by custom site-specific shell scripts, example snippets on a wiki,
or existing tools like the fence_* programs that are available from Red Hat.

h3. Open questions
- do we need to have any kind of framework for unfencing built in to Hadoop? Or is it up to
an administrator to "unfence"?
- is it actually a good idea to include "Cooperative shutdown" in this same framework? or
should we only call fence when we know it's uncooperative?


> HA: namenode fencing mechanism
> ------------------------------
>
>                 Key: HDFS-2179
>                 URL: https://issues.apache.org/jira/browse/HDFS-2179
>             Project: Hadoop HDFS
>          Issue Type: Sub-task
>          Components: name-node
>            Reporter: Todd Lipcon
>            Assignee: Todd Lipcon
>
> In an HA cluster, when there are two NNs, the invariant that only one NN is active at
a time has to be preserved in order to prevent "split brain syndrome." Thus, when a standby
NN is transition to "active" state during a failover, it needs to somehow _fence_ the formerly
active NN to ensure that it can no longer perform edits. This JIRA is to discuss and implement
NN fencing.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Mime
View raw message