zookeeper-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Qian Zhang <zhq527...@gmail.com>
Subject Re: Why does ZooKeeper follower shutdown itself when it can not read from leader
Date Thu, 23 May 2019 01:26:57 GMT
Hi Andor,

I am using ZooKeeper release 3.4.10.

I checked the code, if follower fails to read from leader (e.g., read
timeout), it will close the socket, see
https://github.com/apache/zookeeper/blob/release-3.4.10/src/java/main/org/apache/zookeeper/server/quorum/Follower.java#L91:L85
for
details. And once the socket is close, it will make follower fails to write
(I guess same socket is used here) which will be treated as an severe
unrecoverable error, and then shutdown follower, see
https://github.com/apache/zookeeper/blob/release-3.4.10/src/java/main/org/apache/zookeeper/server/quorum/FollowerRequestProcessor.java#L90:L95
 and
https://github.com/apache/zookeeper/blob/release-3.4.10/src/java/main/org/apache/zookeeper/server/ZooKeeperCriticalThread.java#L48:L51
.

So it seems shutting down follower when it cannot read from leader is the
design behavior? Or if my understanding is wrong can you please let me know
the design behavior in this case? Thanks!


Regards,
Qian Zhang


On Wed, May 22, 2019 at 8:52 AM Qian Zhang <zhq527725@gmail.com> wrote:

> Anyone has any ideas?
>
> Regards,
> Qian Zhang
>
>
> On Sun, May 19, 2019 at 6:15 PM Qian Zhang <zhq527725@gmail.com> wrote:
>
>> Hi,
>>
>> I have a ZooKeeper cluster which has 5 nodes. Today the leader cannot be
>> connected due to a hardware issue, and then I found the 4 followers just
>> shutdown, here is the logs:
>>
>>> May 18 15:34:28 MD001076 java[29148]: [myid:1] WARN
>>> [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2181:Follower@89] - Exception when
>>> following the leader
>>>                                       java.net.SocketTimeoutException:
>>> Read timed out
>>>                                         at
>>> java.net.SocketInputStream.socketRead0(Native Method)
>>>                                         at
>>> java.net.SocketInputStream.socketRead(SocketInputStream.java:116)
>>>                                         at
>>> java.net.SocketInputStream.read(SocketInputStream.java:171)
>>>                                         at
>>> java.net.SocketInputStream.read(SocketInputStream.java:141)
>>>                                         at
>>> java.io.BufferedInputStream.fill(BufferedInputStream.java:246)
>>>                                         at
>>> java.io.BufferedInputStream.read(BufferedInputStream.java:265)
>>>                                         at
>>> java.io.DataInputStream.readInt(DataInputStream.java:387)
>>>                                         at
>>> org.apache.jute.BinaryInputArchive.readInt(BinaryInputArchive.java:63)
>>>                                         at
>>> org.apache.zookeeper.server.quorum.QuorumPacket.deserialize(QuorumPacket.java:83)
>>>                                         at
>>> org.apache.jute.BinaryInputArchive.readRecord(BinaryInputArchive.java:99)
>>>                                         at
>>> org.apache.zookeeper.server.quorum.Learner.readPacket(Learner.java:153)
>>>                                         at
>>> org.apache.zookeeper.server.quorum.Follower.followLeader(Follower.java:85)
>>>                                         at
>>> org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:937)
>>> May 18 15:34:28 MD001076 java[29148]: [myid:1] INFO
>>> [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory@192] -
>>> Accepted socket connectio
>>> n from /10.249.255.10:42306
>>> May 18 15:34:28 MD001076 java[29148]: [myid:1] WARN
>>> [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:ZooKeeperServer@896] -
>>> Connection request from old cl
>>> ient /10.249.255.10:42306; will be dropped if server is in r-o mode
>>> May 18 15:34:28 MD001076 java[29148]: [myid:1] INFO
>>> [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:ZooKeeperServer@942] -
>>> Client attempting to establish
>>>  new session at /10.249.255.10:42306
>>> May 18 15:34:28 MD001076 java[29148]: [myid:1] ERROR
>>> [FollowerRequestProcessor:1:ZooKeeperCriticalThread@49] - Severe
>>> unrecoverable error, from threa
>>> d : FollowerRequestProcessor:1
>>>                                       java.net.SocketException: Socket
>>> closed
>>>                                         at
>>> java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:118)
>>>                                         at
>>> java.net.SocketOutputStream.write(SocketOutputStream.java:155)
>>>                                         at
>>> java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:82)
>>>                                         at
>>> java.io.BufferedOutputStream.flush(BufferedOutputStream.java:140)
>>>                                         at
>>> org.apache.zookeeper.server.quorum.Learner.writePacket(Learner.java:139)
>>>                                         at
>>> org.apache.zookeeper.server.quorum.Learner.request(Learner.java:188)
>>>                                         at
>>> org.apache.zookeeper.server.quorum.FollowerRequestProcessor.run(FollowerRequestProcessor.java:90)
>>> May 18 15:34:28 MD001076 java[29148]: [myid:1] INFO
>>> [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2181:Follower@166] - shutdown called
>>>                                       java.lang.Exception: shutdown
>>> Follower
>>>                                         at
>>> org.apache.zookeeper.server.quorum.Follower.shutdown(Follower.java:166)
>>>                                         at
>>> org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:941)
>>
>>
>> I am confused why all followers shutdown in this case which makes the
>> whole ZooKeeper unusable for a short period, shouldn't they elect a new
>> leader instead? Thanks!
>>
>>
>> Regards,
>> Qian Zhang
>>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message