hadoop-zookeeper-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Benjamin Reed" <br...@yahoo-inc.com>
Subject RE: Re: Leader election stalled
Date Tue, 16 Sep 2008 20:17:09 GMT
Austin,

I have added a patch to ZOOKEEPER-131 to fix (and reproduce your problem). Can you give it
a try?

Thanx
Ben


 -----Original Message-----
From: 	Austin Shoemaker [mailto:austin@cooliris.com]
Sent:	Tuesday, September 16, 2008 04:38 AM Pacific Standard Time
To:	zookeeper-user@hadoop.apache.org
Subject:	Re: Leader election stalled

Got it, thanks! I believe the problem still exists- please see my  
comment.

Best,
Austin

On Sep 16, 2008, at 3:26 AM, Flavio Junqueira wrote:

> Austin, Please check:
>
> https://issues.apache.org/jira/browse/ZOOKEEPER-140
>
> Thanks,
> -Flavio
>
>> -----Original Message-----
>> From: Austin Shoemaker [mailto:austin@cooliris.com]
>> Sent: Tuesday, September 16, 2008 12:22 PM
>> To: zookeeper-user@hadoop.apache.org
>> Subject: Re: Leader election stalled
>>
>> Ben,
>>
>> Here is a proposed fix for the deadlock issue in QuorumCnxManager.
>>
>> The protocol starts by an initiator invoking
>> handleConnection(socket_out) where socket is a connection to a remote
>> peer,
>> or if an incoming connection first triggers
>> handleConnection(socket_in) before we initiate a connection to the
>> peer. In the
>> event that we and the peer both initiate connections, the above calls
>> to handleConnection will proceed on different threads
>> in the same peer.
>>
>> Per-peer instance variables
>> myVersion = 0
>> myChallenge = genChallenge()
>>
>> "socket" is the connection to the peer.
>>
>> boolean handleConnection(socket) throws Exception {
>>     done = false
>>     wins = false
>>
>>     while (!done) {
>>         // Send the current version and challenge to the peer, then
>> wait for it to send its current version and challenge.
>>         // The read is blocking though we expect the peer to write
>> since reads and writes are matched.
>>         synchronized (challengeLock) {
>>     		socket.write(myVersion, myChallenge)
>>     	}
>>         peerVersion, peerChallenge = socket.read()
>>
>>         synchronized (challengeLock) {
>>             // If peer is obsolete, bring it up to date.
>>             if (peerVersion < myVersion) {
>>                 continue;
>>             }
>>
>>             // If we are obsolete, wait to be brought up to date.
>>             if (peerVersion > myVersion) {
>>                 myVersion = peerVersion
>>             	myChallenge = genChallenge()
>>             	continue
>>             }
>>
>>     	    assert(myVersion == peerVersion)
>>
>>             // Challenges are compared, resulting in win, lose, or
>> retry.
>>             if (myChallenge > peerChallenge) {
>>                 wins = true
>>                 done = true
>>             } else if (myChallenge < peerChallenge) {
>>                 done = true
>>             } else {
>>                 ++myVersion
>>                 myChallenge = genChallenge()
>>             }
>>         }
>>     }
>>
>>     // We return true if we won, otherwise we return false. Either we
>> or the peer will win, not both. If a connection error occurs,
>>     // this method will throw an exception.
>>     return wins
>> }
>>
>> Do you think it's correct? I wonder if there is a way to simplify  
>> this
>> protocol.
>>
>> Austin
>>
>> On Sep 12, 2008, at 4:51 PM, Austin Shoemaker wrote:
>>
>>> Ben,
>>>
>>> I am able to run algorithm 3 successfully sometimes, though
>>> frequently the servers deadlock in
>>> QuorumCnxManager:initiateConnection on s.read(msgBuffer) when
>>> reading the challenge from the peer.
>>>
>>> Calls to initiateConnection and receiveConnection are synchronized,
>>> so only one or the other can be executing at a time. This prevents
>>> two connections from opening between the same pair of servers.
>>>
>>> However, it seems that this leads to deadlock, as in this scenario:
>>>
>>> A (initiate --> B)
>>> B (initiate --> C)
>>> C (initiate --> A)
>>>
>>> initiateConnection can only complete when receiveConnection runs on
>>> the remote peer and answers the challenge. If all servers are
>>> blocked in initiateConnection, receiveConnection never runs and
>>> leader election halts.
>>>
>>> Looking forward to your thoughts.
>>>
>>> Thanks,
>>>
>>> Austin
>>>
>>> On Sep 2, 2008, at 10:14 AM, Benjamin Reed wrote:
>>>
>>>> Austin,
>>>>
>>>> Could you try using the new leader election algorithm? You need to
>>>> set
>>>> the algorithm type to 3 and you also need to set the election port
>>>> (TCP)
>>>> to be used.
>>>>
>>>> See http://zookeeper.wiki.sourceforge.net/ZooKeeperConfiguration  
>>>> for
>>>> more details.
>>>>
>>>> ben
>>>>
>>>> -----Original Message-----
>>>> From: Austin Shoemaker [mailto:austin@cooliris.com]
>>>> Sent: Tuesday, September 02, 2008 9:57 AM
>>>> To: zookeeper-user@hadoop.apache.org
>>>> Subject: Leader election stalled
>>>>
>>>> Hi,
>>>>
>>>> We have run into a situation where killing the leader results in
>>>> followers
>>>> perpetually trying to reelect that leader.
>>>>
>>>> We have 11 zookeeper (2.2.1 from SF.net) servers and 256 clients
>>>> connecting
>>>> at random. We kill the leader and observe the impact, monitoring a
>>>> script
>>>> that repeatedly prints the responses to "ruok" and "stat". All
>>>> servers
>>>> except the killed leader respond with "imok" and "ZooKeeperServer  
>>>> not
>>>> running", respectively.
>>>>
>>>> About half of the time, each remaining server gets into a loop of
>>>> failing to
>>>> connect to the killed leader and then reelecting the killed leader.
>>>>
>>>> Here is an example log, which is representative of similar logs on
>>>> the
>>>> other
>>>> servers. We additionally logged connectivity during leader
>>>> election. If
>>>> anyone would like complete logs, let me know.
>>>>
>>>> Thanks,
>>>>
>>>> Austin Shoemaker
>>>>
>>>> WARN  - [QuorumPeer:QuorumPeer@397] - FOLLOWING
>>>> *WARN  - [QuorumPeer:Follower@124] - Following /10.50.65.22:2889*
>>>> ERROR - [QuorumPeer:Follower@137] - FIXMSG
>>>> java.net.ConnectException: Connection refused
>>>> *
>>>> .... cont'd ....*
>>>>
>>>> ERROR - [QuorumPeer:Follower@364] - FIXMSG
>>>> java.lang.Exception: shutdown Follower
>>>>      at
>>>> com.yahoo.zookeeper.server.quorum.Follower.shutdown(Follower.java:
>>>> 364)
>>>>      at
>>>> com.yahoo.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java: 
>>>> 403)
>>>> WARN  - [QuorumPeer:QuorumPeer@388] - LOOKING
>>>> WARN  - [QuorumPeer:LeaderElection@136] - ----> Sending election
>>>> packet
>>>> to /
>>>> 10.50.65.22:2888
>>>> WARN  - [QuorumPeer:LeaderElection@153] - ----> Received response
>>>> from /
>>>> 10.50.65.22:2888
>>>> WARN  - [QuorumPeer:LeaderElection@136] - ----> Sending election
>>>> packet
>>>> to /
>>>> 10.50.65.21:2888
>>>> WARN  - [QuorumPeer:LeaderElection@153] - ----> Received response
>>>> from /
>>>> 10.50.65.21:2888
>>>> WARN  - [QuorumPeer:LeaderElection@136] - ----> Sending election
>>>> packet
>>>> to /
>>>> 10.50.65.12:2888
>>>> WARN  - [QuorumPeer:LeaderElection@153] - ----> Received response
>>>> from /
>>>> 10.50.65.12:2888
>>>> WARN  - [QuorumPeer:LeaderElection@136] - ----> Sending election
>>>> packet
>>>> to /
>>>> 10.50.65.11:2888
>>>> WARN  - [QuorumPeer:LeaderElection@153] - ----> Received response
>>>> from /
>>>> 10.50.65.11:2888
>>>> WARN  - [QuorumPeer:LeaderElection@136] - ----> Sending election
>>>> packet
>>>> to /
>>>> 10.50.65.12:2890
>>>> WARN  - [QuorumPeer:LeaderElection@153] - ----> Received response
>>>> from /
>>>> 10.50.65.12:2890
>>>> WARN  - [QuorumPeer:LeaderElection@136] - ----> Sending election
>>>> packet
>>>> to /
>>>> 10.50.65.11:2890
>>>> WARN  - [QuorumPeer:LeaderElection@153] - ----> Received response
>>>> from /
>>>> 10.50.65.11:2890
>>>> WARN  - [QuorumPeer:LeaderElection@136] - ----> Sending election
>>>> packet
>>>> to /
>>>> 10.50.65.22:2889
>>>> *WARN  - [QuorumPeer:LeaderElection@166] - ----> Exception occurred
>>>> when
>>>> sending / receiving packet to / from /10.50.65.22:2889
>>>> java.net.SocketTimeoutException: Receive timed out
>>>> *WARN  - [QuorumPeer:LeaderElection@136] - ----> Sending election
>>>> packet
>>>> to
>>>> /10.50.65.21:2890
>>>> WARN  - [QuorumPeer:LeaderElection@153] - ----> Received response
>>>> from /
>>>> 10.50.65.21:2890
>>>> WARN  - [QuorumPeer:LeaderElection@136] - ----> Sending election
>>>> packet
>>>> to /
>>>> 10.50.65.21:2889
>>>> WARN  - [QuorumPeer:LeaderElection@153] - ----> Received response
>>>> from /
>>>> 10.50.65.21:2889
>>>> WARN  - [QuorumPeer:LeaderElection@136] - ----> Sending election
>>>> packet
>>>> to /
>>>> 10.50.65.12:2889
>>>> WARN  - [QuorumPeer:LeaderElection@153] - ----> Received response
>>>> from /
>>>> 10.50.65.12:2889
>>>> WARN  - [QuorumPeer:LeaderElection@136] - ----> Sending election
>>>> packet
>>>> to /
>>>> 10.50.65.11:2889
>>>> WARN  - [QuorumPeer:LeaderElection@153] - ----> Received response
>>>> from /
>>>> 10.50.65.11:2889
>>>> WARN  - [QuorumPeer:LeaderElection@89] - Election tally:
>>>> WARN  - [QuorumPeer:LeaderElection@95] - 8 -> 1
>>>> WARN  - [QuorumPeer:LeaderElection@95] - 4 -> 1
>>>> WARN  - [QuorumPeer:LeaderElection@95] - 7 -> 8
>>>> WARN  - [QuorumPeer:LeaderElection@97] - ----> Election complete,
>>>> result.winner = 7
>>>> *WARN  - [QuorumPeer:LeaderElection@100] - ----> Election complete,
>>>> address
>>>> = /10.50.65.22:2889
>>>> WARN  - [QuorumPeer:QuorumPeer@397] - FOLLOWING
>>>> WARN  - [QuorumPeer:Follower@124] - Following /10.50.65.22:2889
>>>> ERROR - [QuorumPeer:Follower@137] - FIXMSG
>>>> java.net.ConnectException: Connection refused
>>>> *        at java.net.PlainSocketImpl.socketConnect(Native Method)
>>>>      at java.net.PlainSocketImpl.doConnect(PlainSocketImpl.java: 
>>>> 333)
>>>>      at
>>>> java.net.PlainSocketImpl.connectToAddress(PlainSocketImpl.java:195)
>>>>      at java.net.PlainSocketImpl.connect(PlainSocketImpl.java:182)
>>>>      at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:366)
>>>>      at java.net.Socket.connect(Socket.java:519)
>>>>      at
>>>> com
>>>> .yahoo 
>>>> .zookeeper.server.quorum.Follower.followLeader(Follower.java:13
>>>> 3)
>>>>      at
>>>> com.yahoo.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java: 
>>>> 399)
>>>
>
>


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message