zookeeper-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Karolos Antoniadis (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (ZOOKEEPER-3478) Leader restart shuts down all the followers
Date Sun, 11 Aug 2019 01:53:00 GMT

    [ https://issues.apache.org/jira/browse/ZOOKEEPER-3478?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16904537#comment-16904537
] 

Karolos Antoniadis commented on ZOOKEEPER-3478:
-----------------------------------------------

Hi Lara,

unless I'm missing something, your ZooKeeper configuration seems unusual. As you said, if
server-11 crashes, your ZK cluster becomes unavailable and you can only tolerate the failure
of one specific physical server, that of server-12. Furthermore, you could have used 3 ZK
servers in total and potentially gain some performance benefits (e.g., faster writes) due
to the smaller quorum, although this probably depends on the workload. Why not use 3 physical
servers where in each physical server a ZK server is running? Then, your system remains available
if *any* of the 3 servers crashes.

Regarding your first question: It is *normal* behaviour that all the followers shutdown during
a leader election. Since there is no leader after a leader crash, the servers that used to
be followers are not followers anymore. So the followers shutdown and go back to {{LOOKING}} state
in order to find the new leader. Have a look at the code  [here|https://github.com/apache/zookeeper/blob/master/zookeeper-server/src/main/java/org/apache/zookeeper/server/quorum/QuorumPeer.java#L1380].
If the leader crashes, {{followLeader}} throws an exception and the follower is subsequently
{{shutdown}}.


Later on, you state that 20 becomes the leader and indeed this seems to be the case. However,
note that the notification messages received after leader election seem to suggest that servers
10, 12, 11 think that 21 is the actual leader since they have {{21 (n.leader)}}. What might
be happening here is something akin to a race condition. For example, the following steps
might have taken place:
1) Assume, server 20 receives enough notifications to become the leader.
2) Before server 20 changes its state to {{LEADING}}, server 21 is back up online and starts
a leader election by sending notification messages to the other servers
3) The remaining servers agree that 21 is the new leader.
4) Server 20 changes its state to {{LEADING}} and tries to {{getEpochToPropose}} but fails
since the other servers consider 21 to be the leader now.
This would explain why servers 10, 11, and 12 try to connect to server 21 instead of 20 as
you mention. As a matter of fact, I managed to reproduce the aforementioned behaviour in the [3.4.10
release|https://github.com/apache/zookeeper/releases/tag/release-3.4.10].

You mentioned that "*The restart was executed in a small amount of time"* If the time between
restarts was longer, then  I believe the issue should not appear.


However, I'm not sure about this: "After 3 unsuccessfull retries from servers 10,11,12, since
the quorum can not be achieved, connection times out and followers started to shut down again,
After they are up, another election is triggered and new LEADER is now located on the first
node (Server that becomes a new leader is 12):" I did not manage to reproduce this behaviour.


Are you able to consistently reproduce the issues you mentioned every time you restart the
servers?

 

Cheers,
Karolos

> Leader restart shuts down all the followers
> -------------------------------------------
>
>                 Key: ZOOKEEPER-3478
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3478
>             Project: ZooKeeper
>          Issue Type: Bug
>    Affects Versions: 3.4.10
>            Reporter: Lara Catipovic
>            Priority: Major
>
> Hello ZooKeeper Community,
> Could you please help me with at least clarifying a few doubts related to ZooKeeper 3.4.10?
>  We have 2 servers in our system, one with 2 Zookeeper servers and the one with 3 - meaning
that in case of failure of the server with 3 Zookeeper servers, the quorum cannot be achieved.
> *Server 11*
>  Zookeeper server 10
>  Zookeeper server 11
>  Zookeeper server 12
> *Server 12*
>  Zookeeper server 20
>  Zookeeper server 21 -> Leader at the beginning of the procedure
> As we were changing something in the configuration, it was needed to restart our servers,
and to keep the quorum up, we restarted servers one by one (first on the one with 3 servers
and then the other with 2 servers).
>  During the restart of the one with 3 servers, the quorum was not lost - since we restarted
one by one.
>  Then we tried to restart the servers on the other one where we have 2 Servers deployed,
one by one also. 
>  The restart was executed in a small amount of time. After we restarted the first server
20 (follower) it joined the quorum with no errors, as expected. 
>  *After we restarted the Leader server (21), all followers started to shut down!*
> We had the same log on all the followers, but here is the example from the follower 20:
> {panel}
> Jun 27 14:49:31 [myid: 20]: WARN Connection broken for id 21, my id = 20, error =
>  Jun 27 14:49:31 javaOFException
>  Jun 27 14:49:31 at java.io.DataInputStream.readInt(Unknown Source)
>  Jun 27 14:49:31 at org.apache.zookeeper.server.quorum.QuorumCnxManager$RecvWorker.run(QuorumCnxManager.java:1013)
>  Jun 27 14:49:31 [myid: 20]: INFO Accepted socket connection from /192.168.1.116:18532
>  Jun 27 14:49:31 [myid: 20]: WARN Exception when following the leader
>  Jun 27 14:49:31 OFException
>  Jun 27 14:49:31 at java.io.DataInputStream.readInt(Unknown Source)
>  Jun 27 14:49:31 at org.apache.jute.BinaryInputArchive.readInt(BinaryInputArchive.java:63)
>  Jun 27 14:49:31 at org.apache.zookeeper.server.quorum.QuorumPacket.deserialize(QuorumPacket.java:83)
>  Jun 27 14:49:31 at org.apache.jute.BinaryInputArchive.readRecord(BinaryInputArchive.java:99)
>  Jun 27 14:49:31 at org.apache.zookeeper.server.quorum.Learner.readPacket(Learner.java:153)
>  Jun 27 14:49:31 at org.apache.zookeeper.server.quorum.Follower.followLeader(Follower.java:85)
>  Jun 27 14:49:31 at org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:937)
>  Jun 27 14:49:31 [myid: 20]: WARN Connection request from old client /192.168.1.116:18532;
will be dropped if server is in r-o mode
>  Jun 27 14:49:31 [myid: 20]: INFO Notification: 1 (message format version), 12 (n.leader),
0x66000012c7 (n.zxid), 0x19 (n.round), LOOKING (n.state), 12 (n.sid), 0x66 (n.peerEpoch) FOLLOWING
(my state)
>  Jun 27 14:49:31 [myid: 20]: WARN Interrupting SendWorker
>  Jun 27 14:49:31 [myid: 20]: INFO Client attempting to renew session 0xa6b9dc92aa60200
at /192.168.1.116:18532
>  Jun 27 14:49:31 [myid: 20]: INFO shutdown called
>  Jun 27 14:49:31 java.lang.Exception: shutdown Follower
>  Jun 27 14:49:31 at org.apache.zookeeper.server.quorum.Follower.shutdown(Follower.java:166)
>  Jun 27 14:49:31 at org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:941)
>  Jun 27 14:49:31 [myid: 20]: INFO Revalidating client: 0xa6b9dc92aa60200
>  Jun 27 14:49:31 [myid: 20]: WARN Interrupted while waiting for message on queue
>  Jun 27 14:49:31 java.InterruptedException
>  Jun 27 14:49:31 at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.reportInterruptAfterWait(Unknown
Source)
>  Jun 27 14:49:31 at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(Unknown
Source)
>  Jun 27 14:49:31 at java.util.concurrent.ArrayBlockingQueue.poll(Unknown Source)
>  Jun 27 14:49:31 at org.apache.zookeeper.server.quorum.QuorumCnxManager.pollSendQueue(QuorumCnxManager.java:1097)
>  Jun 27 14:49:31 at org.apache.zookeeper.server.quorum.QuorumCnxManager.access$700(QuorumCnxManager.java:74)
>  Jun 27 14:49:31 at org.apache.zookeeper.server.quorum.QuorumCnxManager$SendWorker.run(QuorumCnxManager.java:932)
> {panel}
> *Is it expected that Leader in case of its restart triggers shut down of all its followers?*

>  This seems to me as an unexpected behavior, but maybe I'm wrong.
>  
> After this step and the servers are up again, 20 tries to become a Leader and server
21 accepts it and tries to follow the new Leader.
>  20 received ACK messages from itself and from 21. 
>  There are also notifications sent about a new Leader to all other Zookeeper servers:
> {panel}
> Jun 27 14:49:31 [myid: 20]: INFO LEADING
>  Jun 27 14:49:31 [myid: 20]: INFO Created server with tickTime 1500 minSessionTimeout
3000 maxSessionTimeout 30000 datadir /local/cudb/BCServer/version-2 snapdir /local/cudb/BCServer/version-2
>  Jun 27 14:49:31 [myid: 20]: INFO LEADING - LEADER ELECTION TOOK - 213
>  Jun 27 14:49:32 [myid: 20]: INFO Notification: 1 (message format version), 20 (n.leader),
0x66000012c7 (n.zxid), 0x19 (n.round), LOOKING (n.state), 12 (n.sid), 0x66 (n.peerEpoch) LEADING
(my state)
>  Jun 27 14:49:32 [myid: 20]: INFO Notification: 1 (message format version), 20 (n.leader),
0x66000012c7 (n.zxid), 0x19 (n.round), LOOKING (n.state), 10 (n.sid), 0x66 (n.peerEpoch) LEADING
(my state)
>  Jun 27 14:49:32 [myid: 20]: INFO Notification: 1 (message format version), 20 (n.leader),
0x66000012c7 (n.zxid), 0x19 (n.round), LOOKING (n.state), 11 (n.sid), 0x66 (n.peerEpoch) LEADING
(my state)
>  Jun 27 14:49:32 [myid: 20]: INFO Notification: 1 (message format version), 21 (n.leader),
0x66000012c7 (n.zxid), 0x1 (n.round), LOOKING (n.state), 21 (n.sid), 0x66 (n.peerEpoch) LEADING
(my state)
>  Jun 27 14:49:32 [myid: 20]: INFO Notification: 1 (message format version), 21 (n.leader),
0x66000012c7 (n.zxid), 0x19 (n.round), LOOKING (n.state), 21 (n.sid), 0x66 (n.peerEpoch) LEADING
(my state)
>  Jun 27 14:49:32 [myid: 20]: INFO Follower sid: 21 : info : org.apache.zookeeper.server.quorum.QuorumPeer$QuorumServer@466717f0
>  Jun 27 14:49:32 [myid: 20]: INFO Notification: 1 (message format version), 21 (n.leader),
0x66000012c7 (n.zxid), 0x19 (n.round), LOOKING (n.state), 12 (n.sid), 0x66 (n.peerEpoch) LEADING
(my state)
>  Jun 27 14:49:32 [myid: 20]: INFO Notification: 1 (message format version), 21 (n.leader),
0x66000012c7 (n.zxid), 0x19 (n.round), LOOKING (n.state), 12 (n.sid), 0x66 (n.peerEpoch) LEADING
(my state)
>  Jun 27 14:49:32 [myid: 20]: INFO Notification: 1 (message format version), 21 (n.leader),
0x66000012c7 (n.zxid), 0x19 (n.round), LOOKING (n.state), 11 (n.sid), 0x66 (n.peerEpoch) LEADING
(my state)
>  Jun 27 14:49:32 [myid: 20]: INFO Notification: 1 (message format version), 21 (n.leader),
0x66000012c7 (n.zxid), 0x19 (n.round), LOOKING (n.state), 10 (n.sid), 0x66 (n.peerEpoch) LEADING
(my state)
>  Jun 27 14:49:32 [myid: 20]: INFO Notification: 1 (message format version), 21 (n.leader),
0x66000012c7 (n.zxid), 0x19 (n.round), LOOKING (n.state), 11 (n.sid), 0x66 (n.peerEpoch) LEADING
(my state)
>  Jun 27 14:49:32 [myid: 20]: INFO Notification: 1 (message format version), 21 (n.leader),
0x66000012c7 (n.zxid), 0x19 (n.round), LOOKING (n.state), 10 (n.sid), 0x66 (n.peerEpoch) LEADING
(my state)
> {panel}
>  
> From the servers 10, 11, 12 (located on the server with 3 ZooKeeper servers) it can be
seen they all entered the state 'FOLLOWING' and from this step we would expect the Leader
to start leading, and followers to start following:
> {panel}
> Jun 27 14:49:32 [myid: 12]: INFO FOLLOWING
>  Jun 27 14:49:32 [myid: 12]: INFO Created server with tickTime 1500 minSessionTimeout
3000 maxSessionTimeout 30000 datadir
>  Jun 27 14:49:32 [myid: 12]: INFO FOLLOWING - LEADER ELECTION TOOK - 1217
> {panel}
>  
> But, servers from our first system (10,11,12) are not able to connect to new Leader (20),
and it seems they are trying to connect to old Leader (21) (assuming this is the case since
they are all using port 4512 which corresponds to Server 21). This log can be seen on all
servers where we have 3 ZooKeeper servers deployed (10,11,12):
> {panel}
> Jun 27 14:49:38 [myid: 12]: WARN Unexpected exception, tries=0, connecting to /192.168.1.116:4512
>  Jun 27 14:49:38 java.net.SocketTimeoutException: connect timed out
>  Jun 27 14:49:38 at java.net.PlainSocketImpl.socketConnect(Native Method)
>  Jun 27 14:49:38 at java.net.AbstractPlainSocketImpl.doConnect(Unknown Source)
>  Jun 27 14:49:38 at java.net.AbstractPlainSocketImpl.connectToAddress(Unknown Source)
>  Jun 27 14:49:38 at java.net.AbstractPlainSocketImpl.connect(Unknown Source)
>  Jun 27 14:49:38 at java.net.SocksSocketImpl.connect(Unknown Source)
>  Jun 27 14:49:38 at java.net.Socket.connect(Unknown Source)
>  Jun 27 14:49:38 at org.apache.zookeeper.server.quorum.Learner.connectToLeader(Learner.java:231)
>  Jun 27 14:49:38 at org.apache.zookeeper.server.quorum.Follower.followLeader(Follower.java:71)
>  Jun 27 14:49:38 at org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:937)
> {panel}
> *Is my understanding correct, or this port is not indicating they are trying to connect
to the wrong server?*
>  
> Even if the Leader restart provoked the restart of all its followers, seems that in this
step all other servers should start to connect to the new leader and form a quorum of followers.
>  Instead of that scenario, a few seconds later, timeout occurs while waiting for epoch
for quorum (the followers never start following although they all received notifications,
and they try to connect to old leader) and the 'new' Leader shuts down again:
> {panel}
> Jun 27 14:49:39 [myid: 20]: WARN Unexpected exception
>  *Jun 27 14:49:39 java.lang.InterruptedException: Timeout while waiting for epoch from
quorum*
>  Jun 27 14:49:39 at org.apache.zookeeper.server.quorum.Leader.getEpochToPropose(Leader.java:896)
>  Jun 27 14:49:39 at org.apache.zookeeper.server.quorum.Leader.lead(Leader.java:389)
>  Jun 27 14:49:39 at org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:950)
>  Jun 27 14:49:39 [myid: 20]: INFO Shutting down
>  Jun 27 14:49:39 [myid: 20]: INFO Shutdown called
>  Jun 27 14:49:39 java.lang.Exception: shutdown Leader! reason: Forcing shutdown
>  Jun 27 14:49:39 at org.apache.zookeeper.server.quorum.Leader.shutdown(Leader.java:517)
>  Jun 27 14:49:39 at org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:956)
>  Jun 27 14:49:39 [myid: 20]: INFO exception while shutting down acceptor: java.net.SocketException:
Socket closed
>  Jun 27 14:49:39 [myid: 20]: INFO LOOKING
>  Jun 27 14:49:39 [myid: 20]: INFO New election. My id = 20, proposed zxid=0x66000012c7
>  Jun 27 14:49:39 [myid: 20]: ERROR Unexpected exception causing shutdown
>  Jun 27 14:49:39 java.InterruptedException
>  Jun 27 14:49:39 at java.lang.Object.wait(Native Method)
>  Jun 27 14:49:39 at org.apache.zookeeper.server.quorum.Leader.getEpochToPropose(Leader.java:892)
>  Jun 27 14:49:39 at org.apache.zookeeper.server.quorum.LearnerHandler.run(LearnerHandler.java:358)
>  Jun 27 14:49:39 [myid: 20]: INFO Notification: 1 (message format version), 20 (n.leader),
0x66000012c7 (n.zxid), 0x1a (n.round), LOOKING (n.state), 20 (n.sid), 0x66 (n.peerEpoch) LOOKING
(my state)
>  Jun 27 14:49:39 [myid: 20]: WARN ******* GOODBYE /10.22.0.2:55268 ********
> {panel}
>  
> After 3 unsuccessfull retries from servers 10,11,12, since the quorum can not be achieved,
connection times out and followers started to shut down again, After they are up, another
election is triggered and new LEADER is now located on the first node (Server that becomes
a new leader is 12):
> {panel}
> Jun 27 14:50:07 [myid: 12]: ERROR Unexpected exception
>  Jun 27 14:50:07 java.net.SocketTimeoutException: connect timed out
>  Jun 27 14:50:07 at java.net.PlainSocketImpl.socketConnect(Native Method)
>  Jun 27 14:50:07 at java.net.AbstractPlainSocketImpl.doConnect(Unknown Source)
>  Jun 27 14:50:07 at java.net.AbstractPlainSocketImpl.connectToAddress(Unknown Source)
>  Jun 27 14:50:07 at java.net.AbstractPlainSocketImpl.connect(Unknown Source)
>  Jun 27 14:50:07 at java.net.SocksSocketImpl.connect(Unknown Source)
>  Jun 27 14:50:07 at java.net.Socket.connect(Unknown Source)
>  Jun 27 14:50:07 at org.apache.zookeeper.server.quorum.Learner.connectToLeader(Learner.java:231)
>  Jun 27 14:50:07 at org.apache.zookeeper.server.quorum.Follower.followLeader(Follower.java:71)
>  Jun 27 14:50:07 at org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:937)
>  Jun 27 14:50:07 [myid: 12]: WARN Exception when following the leader
>  Jun 27 14:50:07 java.net.SocketTimeoutException: connect timed out
>  Jun 27 14:50:07 at java.net.PlainSocketImpl.socketConnect(Native Method)
>  Jun 27 14:50:07 at java.net.AbstractPlainSocketImpl.doConnect(Unknown Source)
>  Jun 27 14:50:07 at java.net.AbstractPlainSocketImpl.connectToAddress(Unknown Source)
>  Jun 27 14:50:07 at java.net.AbstractPlainSocketImpl.connect(Unknown Source)
>  Jun 27 14:50:07 at java.net.SocksSocketImpl.connect(Unknown Source)
>  Jun 27 14:50:07 at java.net.Socket.connect(Unknown Source)
>  Jun 27 14:50:07 at org.apache.zookeeper.server.quorum.Learner.connectToLeader(Learner.java:231)
>  Jun 27 14:50:07 at org.apache.zookeeper.server.quorum.Follower.followLeader(Follower.java:71)
>  Jun 27 14:50:07 at org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:937)
>  Jun 27 14:50:07 [myid: 12]: INFO shutdown called
>  Jun 27 14:50:07 java.lang.Exception: shutdown Follower
>  Jun 27 14:50:07 at org.apache.zookeeper.server.quorum.Follower.shutdown(Follower.java:166)
>  Jun 27 14:50:07 at org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:941)
>  Jun 27 14:50:07 [myid: 12]: INFO Shutting down
>  Jun 27 14:50:07 [myid: 12]: INFO LOOKING
>  Jun 27 14:50:07 [myid: 12]: INFO New election. My id = 12, proposed zxid=0x66000012c7
>  Jun 27 14:50:08 [myid: 12]: INFO LEADING
> {panel}
>  
> After this, all other Zookeeper servers normally start to follow the new leader and everything
starts to work just fine.
>  
> Could you please help me and answer the following questions:
>  - is it expected behavior that Leader shutdowns all other servers (followers) during
after its own restart?
>  -> if this is expected, could you please explain in which situations we can expect
this behavior and why?
>  - if there was a notification sent about the new leader (20) to all other servers, why
they were still connecting to old leader?
>  - do you have any recommendations on how to 'fix' this behavior?
> Any help will be highly appreciated.
>  Thanks in advance!
> Kind regards,
>  Lara



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

Mime
View raw message