hadoop-zookeeper-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Austin Shoemaker <aus...@cooliris.com>
Subject Re: Leader election stalled
Date Tue, 16 Sep 2008 10:22:28 GMT
Ben,

Here is a proposed fix for the deadlock issue in QuorumCnxManager.

The protocol starts by an initiator invoking  
handleConnection(socket_out) where socket is a connection to a remote  
peer,
or if an incoming connection first triggers  
handleConnection(socket_in) before we initiate a connection to the  
peer. In the
event that we and the peer both initiate connections, the above calls  
to handleConnection will proceed on different threads
in the same peer.

Per-peer instance variables
myVersion = 0
myChallenge = genChallenge()

"socket" is the connection to the peer.

boolean handleConnection(socket) throws Exception {
     done = false
     wins = false

     while (!done) {
         // Send the current version and challenge to the peer, then  
wait for it to send its current version and challenge.
         // The read is blocking though we expect the peer to write  
since reads and writes are matched.
         synchronized (challengeLock) {
     		socket.write(myVersion, myChallenge)
     	}
         peerVersion, peerChallenge = socket.read()

         synchronized (challengeLock) {
             // If peer is obsolete, bring it up to date.
             if (peerVersion < myVersion) {
                 continue;
             }

             // If we are obsolete, wait to be brought up to date.
             if (peerVersion > myVersion) {
                 myVersion = peerVersion
             	myChallenge = genChallenge()
             	continue
             }
     	
     	    assert(myVersion == peerVersion)

             // Challenges are compared, resulting in win, lose, or  
retry.
             if (myChallenge > peerChallenge) {
                 wins = true
                 done = true
             } else if (myChallenge < peerChallenge) {
                 done = true
             } else {
                 ++myVersion
                 myChallenge = genChallenge()
             }
         }
     }

     // We return true if we won, otherwise we return false. Either we  
or the peer will win, not both. If a connection error occurs,
     // this method will throw an exception.
     return wins
}

Do you think it's correct? I wonder if there is a way to simplify this  
protocol.

Austin

On Sep 12, 2008, at 4:51 PM, Austin Shoemaker wrote:

> Ben,
>
> I am able to run algorithm 3 successfully sometimes, though  
> frequently the servers deadlock in  
> QuorumCnxManager:initiateConnection on s.read(msgBuffer) when  
> reading the challenge from the peer.
>
> Calls to initiateConnection and receiveConnection are synchronized,  
> so only one or the other can be executing at a time. This prevents  
> two connections from opening between the same pair of servers.
>
> However, it seems that this leads to deadlock, as in this scenario:
>
> A (initiate --> B)
> B (initiate --> C)
> C (initiate --> A)
>
> initiateConnection can only complete when receiveConnection runs on  
> the remote peer and answers the challenge. If all servers are  
> blocked in initiateConnection, receiveConnection never runs and  
> leader election halts.
>
> Looking forward to your thoughts.
>
> Thanks,
>
> Austin
>
> On Sep 2, 2008, at 10:14 AM, Benjamin Reed wrote:
>
>> Austin,
>>
>> Could you try using the new leader election algorithm? You need to  
>> set
>> the algorithm type to 3 and you also need to set the election port  
>> (TCP)
>> to be used.
>>
>> See http://zookeeper.wiki.sourceforge.net/ZooKeeperConfiguration for
>> more details.
>>
>> ben
>>
>> -----Original Message-----
>> From: Austin Shoemaker [mailto:austin@cooliris.com]
>> Sent: Tuesday, September 02, 2008 9:57 AM
>> To: zookeeper-user@hadoop.apache.org
>> Subject: Leader election stalled
>>
>> Hi,
>>
>> We have run into a situation where killing the leader results in
>> followers
>> perpetually trying to reelect that leader.
>>
>> We have 11 zookeeper (2.2.1 from SF.net) servers and 256 clients
>> connecting
>> at random. We kill the leader and observe the impact, monitoring a
>> script
>> that repeatedly prints the responses to "ruok" and "stat". All  
>> servers
>> except the killed leader respond with "imok" and "ZooKeeperServer not
>> running", respectively.
>>
>> About half of the time, each remaining server gets into a loop of
>> failing to
>> connect to the killed leader and then reelecting the killed leader.
>>
>> Here is an example log, which is representative of similar logs on  
>> the
>> other
>> servers. We additionally logged connectivity during leader  
>> election. If
>> anyone would like complete logs, let me know.
>>
>> Thanks,
>>
>> Austin Shoemaker
>>
>> WARN  - [QuorumPeer:QuorumPeer@397] - FOLLOWING
>> *WARN  - [QuorumPeer:Follower@124] - Following /10.50.65.22:2889*
>> ERROR - [QuorumPeer:Follower@137] - FIXMSG
>> java.net.ConnectException: Connection refused
>> *
>> .... cont'd ....*
>>
>> ERROR - [QuorumPeer:Follower@364] - FIXMSG
>> java.lang.Exception: shutdown Follower
>>       at
>> com.yahoo.zookeeper.server.quorum.Follower.shutdown(Follower.java: 
>> 364)
>>       at
>> com.yahoo.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:403)
>> WARN  - [QuorumPeer:QuorumPeer@388] - LOOKING
>> WARN  - [QuorumPeer:LeaderElection@136] - ----> Sending election  
>> packet
>> to /
>> 10.50.65.22:2888
>> WARN  - [QuorumPeer:LeaderElection@153] - ----> Received response  
>> from /
>> 10.50.65.22:2888
>> WARN  - [QuorumPeer:LeaderElection@136] - ----> Sending election  
>> packet
>> to /
>> 10.50.65.21:2888
>> WARN  - [QuorumPeer:LeaderElection@153] - ----> Received response  
>> from /
>> 10.50.65.21:2888
>> WARN  - [QuorumPeer:LeaderElection@136] - ----> Sending election  
>> packet
>> to /
>> 10.50.65.12:2888
>> WARN  - [QuorumPeer:LeaderElection@153] - ----> Received response  
>> from /
>> 10.50.65.12:2888
>> WARN  - [QuorumPeer:LeaderElection@136] - ----> Sending election  
>> packet
>> to /
>> 10.50.65.11:2888
>> WARN  - [QuorumPeer:LeaderElection@153] - ----> Received response  
>> from /
>> 10.50.65.11:2888
>> WARN  - [QuorumPeer:LeaderElection@136] - ----> Sending election  
>> packet
>> to /
>> 10.50.65.12:2890
>> WARN  - [QuorumPeer:LeaderElection@153] - ----> Received response  
>> from /
>> 10.50.65.12:2890
>> WARN  - [QuorumPeer:LeaderElection@136] - ----> Sending election  
>> packet
>> to /
>> 10.50.65.11:2890
>> WARN  - [QuorumPeer:LeaderElection@153] - ----> Received response  
>> from /
>> 10.50.65.11:2890
>> WARN  - [QuorumPeer:LeaderElection@136] - ----> Sending election  
>> packet
>> to /
>> 10.50.65.22:2889
>> *WARN  - [QuorumPeer:LeaderElection@166] - ----> Exception occurred  
>> when
>> sending / receiving packet to / from /10.50.65.22:2889
>> java.net.SocketTimeoutException: Receive timed out
>> *WARN  - [QuorumPeer:LeaderElection@136] - ----> Sending election  
>> packet
>> to
>> /10.50.65.21:2890
>> WARN  - [QuorumPeer:LeaderElection@153] - ----> Received response  
>> from /
>> 10.50.65.21:2890
>> WARN  - [QuorumPeer:LeaderElection@136] - ----> Sending election  
>> packet
>> to /
>> 10.50.65.21:2889
>> WARN  - [QuorumPeer:LeaderElection@153] - ----> Received response  
>> from /
>> 10.50.65.21:2889
>> WARN  - [QuorumPeer:LeaderElection@136] - ----> Sending election  
>> packet
>> to /
>> 10.50.65.12:2889
>> WARN  - [QuorumPeer:LeaderElection@153] - ----> Received response  
>> from /
>> 10.50.65.12:2889
>> WARN  - [QuorumPeer:LeaderElection@136] - ----> Sending election  
>> packet
>> to /
>> 10.50.65.11:2889
>> WARN  - [QuorumPeer:LeaderElection@153] - ----> Received response  
>> from /
>> 10.50.65.11:2889
>> WARN  - [QuorumPeer:LeaderElection@89] - Election tally:
>> WARN  - [QuorumPeer:LeaderElection@95] - 8 -> 1
>> WARN  - [QuorumPeer:LeaderElection@95] - 4 -> 1
>> WARN  - [QuorumPeer:LeaderElection@95] - 7 -> 8
>> WARN  - [QuorumPeer:LeaderElection@97] - ----> Election complete,
>> result.winner = 7
>> *WARN  - [QuorumPeer:LeaderElection@100] - ----> Election complete,
>> address
>> = /10.50.65.22:2889
>> WARN  - [QuorumPeer:QuorumPeer@397] - FOLLOWING
>> WARN  - [QuorumPeer:Follower@124] - Following /10.50.65.22:2889
>> ERROR - [QuorumPeer:Follower@137] - FIXMSG
>> java.net.ConnectException: Connection refused
>> *        at java.net.PlainSocketImpl.socketConnect(Native Method)
>>       at java.net.PlainSocketImpl.doConnect(PlainSocketImpl.java:333)
>>       at
>> java.net.PlainSocketImpl.connectToAddress(PlainSocketImpl.java:195)
>>       at java.net.PlainSocketImpl.connect(PlainSocketImpl.java:182)
>>       at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:366)
>>       at java.net.Socket.connect(Socket.java:519)
>>       at
>> com 
>> .yahoo.zookeeper.server.quorum.Follower.followLeader(Follower.java:13
>> 3)
>>       at
>> com.yahoo.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:399)
>


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message