hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Hari Mankude (Commented) (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-3192) Active NN should exit when it has not received a getServiceStatus() rpc from ZKFC for timeout secs
Date Thu, 05 Apr 2012 02:35:18 GMT

    [ https://issues.apache.org/jira/browse/HDFS-3192?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13246949#comment-13246949

Hari Mankude commented on HDFS-3192:

bq.2a) If the local node is still accessible, then the other node will have already called
transitionToStandby(), in which case our own call will be a no-op (since we're already in
standby state). Everything is correct, because the transitionToStandby() call flushes everything
and gracefully closes its edit log writer.

This is very dangerous and can result in all sorts of races.

1. ZKFC2 initiates transitionToStandby() to NN1.
2. Meanwhile, RPC does not start on NN1.
3. ZKFC2 loses the znode.
4. ZKFC1 now takes over
5. ZKFC1 does a becomeActive() on NN1.
6. transitionToStandby() starts executing converting NN1 to standby. There are no active NNs
in the cluster now.

ZKFC should communicate ONLY with its local NN. Otherwise, it will result in all sorts of
messy race conditions. Communication between NNs should be via zookeeper znodes, editlogs
and datanodes.

bq.    1) NN1 writing to edits log
    2) ZKFC1 loses lease, but doesn't know about it yet
    3) ZKFC2 gets lease
    4) NN2 becomes active, starts writing logs
    5) NN1 writes some edits. World explodes.
    6) ZKFC1 gets asynchronous notification from ZK that it lots its session. Anything you
do at this point is too late.

bq.Doing RPC to your own NN is subject to way more race conditions because we have no way
of enforcing an ordering between NN1 going standby and NN2 becoming active. NN2 has to verify
that NN1 is either standby or effectively dead before becoming active. The only way to do
that is to first (a) ask it to be standby, or (b) fence.

I disagree. Doing RPC to your own NN is the safest mechanism that is available in the HA environment.
It is definitely safer than doing the RPC to a remote NN. Do you agree? 
I would like to make sure that I consider fencing required also and I am not suggesting this
method as an alternative to fencing. Instead, this method will ensure that there are lesser
situations where complicated algorithm of fencing would have to be used and ensures that there
is less probability of error.

bq. The "self-resign" in step 6 is insufficient. We have to fence between step 3 and step
4. Whatever NN1 happens to do after that point doesn't help anything because it's too late.

I am not talking about self-resign in this situation. Self-resign as per this jira will happen
only if ZKFC1 is dead. In the above example, ZKFC1 is not dead. 
For the above example, ZKFC1 should abort NN1 when znode state change has happened and restart

> Active NN should exit when it has not received a getServiceStatus() rpc from ZKFC for
timeout secs
> --------------------------------------------------------------------------------------------------
>                 Key: HDFS-3192
>                 URL: https://issues.apache.org/jira/browse/HDFS-3192
>             Project: Hadoop HDFS
>          Issue Type: Sub-task
>          Components: ha, name-node
>            Reporter: Hari Mankude
>            Assignee: Hari Mankude

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira


View raw message