hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Todd Lipcon (Commented) (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-3192) Active NN should exit when it has not received a getServiceStatus() rpc from ZKFC for timeout secs
Date Wed, 04 Apr 2012 22:05:19 GMT

    [ https://issues.apache.org/jira/browse/HDFS-3192?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13246778#comment-13246778

Todd Lipcon commented on HDFS-3192:

I think there is confusion here over the terminology "loses quorum".

I agree completely about the following: if any NN fails to sync its edit logs, it needs to
abort. This is already what it does today - no changes necessary.
If the edit log happens to be implemented using a quorum protocol (HDFS-3092 or HDFS-3077
for example), then that behavior should be maintained. The JournalManager implementation needs
to throw an exception in response to logSync(). That will cause the NN to abort.

That's all that's necessary for correctness - an NN won't ack "success" to a write unless
it successfully syncs it, and will abort rather than rollback, since we have no rollback capability.

In the above sense, "loses quorum" really means "loses write access to the edit logs".

If instead you're talking about "loses quorum" as "loses its ZK session", then no abort is
necessary, because it may still be able to write to its edits. So long as it's getting "success"
back from editLog.logSync(), then the edits are being persisted. It is the responsibility
of the next active to fence access to the shared edits. It may do so in one of two ways:
1) Edits fencing: ensure that the next write to the edits mechanism throws IOE. In the case
of FileJournalManager on NAS, this is done via an RPC to the NAS system to fence the given
2) STONITH: ensure that the next write fails because power has been yanked from the machine.

Alternatively, the new active may first try a "graceful transition":
3) Gracefully ask the prior active to stop writing. The prior active flushes anything buffered,
successfully syncs, and then enters standby mode.

Notably, "self-stonith upon losing the ZK lease" is not an option, because it may take arbitrarily
long before it notices. EG:
1) NN1 writing to edits log
2) ZKFC1 loses lease, but doesn't know about it yet
3) ZKFC2 gets lease
4) NN2 becomes active, starts writing logs
5) NN1 writes some edits. World explodes.
6) ZKFC1 gets asynchronous notification from ZK that it lots its session. Anything you do
at this point is _too late_.

Before step 4, NN2 must use a fencing mechanism. *Regardless* of whatever steps NN1 or ZKFC1
might take in step 6.

> Active NN should exit when it has not received a getServiceStatus() rpc from ZKFC for
timeout secs
> --------------------------------------------------------------------------------------------------
>                 Key: HDFS-3192
>                 URL: https://issues.apache.org/jira/browse/HDFS-3192
>             Project: Hadoop HDFS
>          Issue Type: Sub-task
>          Components: ha, name-node
>            Reporter: Hari Mankude
>            Assignee: Hari Mankude

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira


View raw message