hadoop-zookeeper-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mahadev Konar <maha...@yahoo-inc.com>
Subject Re: Leader election stalled
Date Tue, 02 Sep 2008 17:05:41 GMT
Hi Austin,
 Did you kill the leader process? It looks like that you didn't kill the
server since its responding to ruok. Is that true?

mahadev


On 9/2/08 9:56 AM, "Austin Shoemaker" <austin@cooliris.com> wrote:

> Hi,
> 
> We have run into a situation where killing the leader results in followers
> perpetually trying to reelect that leader.
> 
> We have 11 zookeeper (2.2.1 from SF.net) servers and 256 clients connecting
> at random. We kill the leader and observe the impact, monitoring a script
> that repeatedly prints the responses to "ruok" and "stat". All servers
> except the killed leader respond with "imok" and "ZooKeeperServer not
> running", respectively.
> 
> About half of the time, each remaining server gets into a loop of failing to
> connect to the killed leader and then reelecting the killed leader.
> 
> Here is an example log, which is representative of similar logs on the other
> servers. We additionally logged connectivity during leader election. If
> anyone would like complete logs, let me know.
> 
> Thanks,
> 
> Austin Shoemaker
> 
> WARN  - [QuorumPeer:QuorumPeer@397] - FOLLOWING
> *WARN  - [QuorumPeer:Follower@124] - Following /10.50.65.22:2889*
> ERROR - [QuorumPeer:Follower@137] - FIXMSG
> java.net.ConnectException: Connection refused
> *
> .... cont'd ....*
> 
> ERROR - [QuorumPeer:Follower@364] - FIXMSG
> java.lang.Exception: shutdown Follower
>         at
> com.yahoo.zookeeper.server.quorum.Follower.shutdown(Follower.java:364)
>         at
> com.yahoo.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:403)
> WARN  - [QuorumPeer:QuorumPeer@388] - LOOKING
> WARN  - [QuorumPeer:LeaderElection@136] - ----> Sending election packet to /
> 10.50.65.22:2888
> WARN  - [QuorumPeer:LeaderElection@153] - ----> Received response from /
> 10.50.65.22:2888
> WARN  - [QuorumPeer:LeaderElection@136] - ----> Sending election packet to /
> 10.50.65.21:2888
> WARN  - [QuorumPeer:LeaderElection@153] - ----> Received response from /
> 10.50.65.21:2888
> WARN  - [QuorumPeer:LeaderElection@136] - ----> Sending election packet to /
> 10.50.65.12:2888
> WARN  - [QuorumPeer:LeaderElection@153] - ----> Received response from /
> 10.50.65.12:2888
> WARN  - [QuorumPeer:LeaderElection@136] - ----> Sending election packet to /
> 10.50.65.11:2888
> WARN  - [QuorumPeer:LeaderElection@153] - ----> Received response from /
> 10.50.65.11:2888
> WARN  - [QuorumPeer:LeaderElection@136] - ----> Sending election packet to /
> 10.50.65.12:2890
> WARN  - [QuorumPeer:LeaderElection@153] - ----> Received response from /
> 10.50.65.12:2890
> WARN  - [QuorumPeer:LeaderElection@136] - ----> Sending election packet to /
> 10.50.65.11:2890
> WARN  - [QuorumPeer:LeaderElection@153] - ----> Received response from /
> 10.50.65.11:2890
> WARN  - [QuorumPeer:LeaderElection@136] - ----> Sending election packet to /
> 10.50.65.22:2889
> *WARN  - [QuorumPeer:LeaderElection@166] - ----> Exception occurred when
> sending / receiving packet to / from /10.50.65.22:2889
> java.net.SocketTimeoutException: Receive timed out
> *WARN  - [QuorumPeer:LeaderElection@136] - ----> Sending election packet to
> /10.50.65.21:2890
> WARN  - [QuorumPeer:LeaderElection@153] - ----> Received response from /
> 10.50.65.21:2890
> WARN  - [QuorumPeer:LeaderElection@136] - ----> Sending election packet to /
> 10.50.65.21:2889
> WARN  - [QuorumPeer:LeaderElection@153] - ----> Received response from /
> 10.50.65.21:2889
> WARN  - [QuorumPeer:LeaderElection@136] - ----> Sending election packet to /
> 10.50.65.12:2889
> WARN  - [QuorumPeer:LeaderElection@153] - ----> Received response from /
> 10.50.65.12:2889
> WARN  - [QuorumPeer:LeaderElection@136] - ----> Sending election packet to /
> 10.50.65.11:2889
> WARN  - [QuorumPeer:LeaderElection@153] - ----> Received response from /
> 10.50.65.11:2889
> WARN  - [QuorumPeer:LeaderElection@89] - Election tally:
> WARN  - [QuorumPeer:LeaderElection@95] - 8 -> 1
> WARN  - [QuorumPeer:LeaderElection@95] - 4 -> 1
> WARN  - [QuorumPeer:LeaderElection@95] - 7 -> 8
> WARN  - [QuorumPeer:LeaderElection@97] - ----> Election complete,
> result.winner = 7
> *WARN  - [QuorumPeer:LeaderElection@100] - ----> Election complete, address
> = /10.50.65.22:2889
> WARN  - [QuorumPeer:QuorumPeer@397] - FOLLOWING
> WARN  - [QuorumPeer:Follower@124] - Following /10.50.65.22:2889
> ERROR - [QuorumPeer:Follower@137] - FIXMSG
> java.net.ConnectException: Connection refused
> *        at java.net.PlainSocketImpl.socketConnect(Native Method)
>         at java.net.PlainSocketImpl.doConnect(PlainSocketImpl.java:333)
>         at
> java.net.PlainSocketImpl.connectToAddress(PlainSocketImpl.java:195)
>         at java.net.PlainSocketImpl.connect(PlainSocketImpl.java:182)
>         at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:366)
>         at java.net.Socket.connect(Socket.java:519)
>         at
> com.yahoo.zookeeper.server.quorum.Follower.followLeader(Follower.java:133)
>         at
> com.yahoo.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:399)


Mime
View raw message