hadoop-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Paul K. Harter, Jr." <Paul.Har...@oracle.com>
Subject Network paritions and Failover Times
Date Tue, 22 Apr 2014 21:49:13 GMT
I am trying to understand the mechanisms and timing involved when Hadoop is

with a network partition.  Suppose we have a large Hadoop cluster configured

automatic failover:

1)      Active Name node

2)      Standby NameNode

3)      Quorum journal nodes  (which we'll ignore for now)

4)      Zookeeper ensemble with 3 nodes


Suppose the zookeeper session from the active name node happens to be direct

to the ZK leader node, and that the system experiences a network failure

in 2 partitions (A and B) with the nodes distributed as follows:

A)     Zookeeper leader node; 
Active NameNode

B)      2 Zookeeper followers
Standby NameNode



Seems the result should be that both Zookeeper and the NameNode fail over to

partition B,  but I wanted to confirm the sequence of actions as outlined

Does this look right?


If the network failure occurs at time zero, then how long should this whole

take, if for example, syncLimit is 5 ticks and the NameNode sessionTImeout
is 10 ticks??



FAILOVER SEQUENCE (as I understand it):


    1) Leader, who ends up in the minority, loses connection to remaining



    2) After syncLimit, the ZK ensemble realizes there's a problem.  If a

       follower loses connection, then he is dropped by the leader, and

       no longer participates in voting. 


       However, in this case the Leader no longer has quorum, so he has to

       relinquish leadership.  He stops responding to client requests,

       enters the LOOKING state and and starts trying to form/join a quorum

       (it informs the ZK client library, and) all clients are notified

       with a DISCONNECTED event.  (or is it that the DISCONNECTED event

       delivered to the client library who delivers connection loss

       exceptions to clients?)  


       The remaining nodes on the majority side enter leader election and

       choose a new leader (which starts a new epoch) on the majority



    3) All clients who were connected to the (now former) leader are told to

       reconnect and will either fail if they can't talk to a node on the

       new majority side or will succeed in connecting with a node in the



    4) Meanwhile, when the Active NameNode is informed that its server has

       become disconnected (DISCONNECTED event), it must stop responding

       like the Active NameNode.  

       When the ZK quorum reforms and does not get heartbeats from the

       (formerly) Active Name node, will eventually (SessionTimeout)

       declare its session dead.  This deletes the ephemeral node being

       used to hold its lock on its status as "Active" and triggers the

       Watcher for the Standby NameNode. 

       The Standby then attempts to compete for Active Name Node election

       and should win and become the new Active.


View raw message