hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Todd Lipcon (Created) (JIRA)" <j...@apache.org>
Subject [jira] [Created] (HADOOP-8217) Edge case split-brain race in ZK-based auto-failover
Date Mon, 26 Mar 2012 22:24:26 GMT
Edge case split-brain race in ZK-based auto-failover

                 Key: HADOOP-8217
                 URL: https://issues.apache.org/jira/browse/HADOOP-8217
             Project: Hadoop Common
          Issue Type: Bug
          Components: ha
    Affects Versions: 0.24.0
            Reporter: Todd Lipcon
            Assignee: Todd Lipcon

As discussed in HADOOP-8206, the current design for automatic failover has the following race:
- ZKFC1 gets active lock
- ZKFC1 is about to send transitionToActive() and machine freezes (eg GC pause + swapping)
- ZKFC1 loses its ZK lock, ZKFC2 gets ZK lock
- ZKFC2 calls transitionToStandby on NN1, and transitions NN2 to active
- ZKFC1 wakes up from pause, calls transitionToActive(), now we have a bad situation

This is rare, since it requires ZKFC1 to freeze longer than its ZK session timeout, but worth
fixing, since the results can be disastrous.

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira


View raw message