zookeeper-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Qian Zhang <zhq527...@gmail.com>
Subject Re: Why does ZooKeeper follower shutdown itself when it can not read from leader
Date Sun, 26 May 2019 08:13:24 GMT
I see, thank you Patrick!


Regards,
Qian Zhang


On Thu, May 23, 2019 at 9:26 AM Qian Zhang <zhq527725@gmail.com> wrote:

> Hi Andor,
>
> I am using ZooKeeper release 3.4.10.
>
> I checked the code, if follower fails to read from leader (e.g., read
> timeout), it will close the socket, see
> https://github.com/apache/zookeeper/blob/release-3.4.10/src/java/main/org/apache/zookeeper/server/quorum/Follower.java#L91:L85
for
> details. And once the socket is close, it will make follower fails to write
> (I guess same socket is used here) which will be treated as an severe
> unrecoverable error, and then shutdown follower, see
> https://github.com/apache/zookeeper/blob/release-3.4.10/src/java/main/org/apache/zookeeper/server/quorum/FollowerRequestProcessor.java#L90:L95
>  and
> https://github.com/apache/zookeeper/blob/release-3.4.10/src/java/main/org/apache/zookeeper/server/ZooKeeperCriticalThread.java#L48:L51
> .
>
> So it seems shutting down follower when it cannot read from leader is the
> design behavior? Or if my understanding is wrong can you please let me know
> the design behavior in this case? Thanks!
>
>
> Regards,
> Qian Zhang
>
>
> On Wed, May 22, 2019 at 8:52 AM Qian Zhang <zhq527725@gmail.com> wrote:
>
>> Anyone has any ideas?
>>
>> Regards,
>> Qian Zhang
>>
>>
>> On Sun, May 19, 2019 at 6:15 PM Qian Zhang <zhq527725@gmail.com> wrote:
>>
>>> Hi,
>>>
>>> I have a ZooKeeper cluster which has 5 nodes. Today the leader cannot be
>>> connected due to a hardware issue, and then I found the 4 followers just
>>> shutdown, here is the logs:
>>>
>>>> May 18 15:34:28 MD001076 java[29148]: [myid:1] WARN
>>>> [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2181:Follower@89] - Exception when
>>>> following the leader
>>>>                                       java.net.SocketTimeoutException:
>>>> Read timed out
>>>>                                         at
>>>> java.net.SocketInputStream.socketRead0(Native Method)
>>>>                                         at
>>>> java.net.SocketInputStream.socketRead(SocketInputStream.java:116)
>>>>                                         at
>>>> java.net.SocketInputStream.read(SocketInputStream.java:171)
>>>>                                         at
>>>> java.net.SocketInputStream.read(SocketInputStream.java:141)
>>>>                                         at
>>>> java.io.BufferedInputStream.fill(BufferedInputStream.java:246)
>>>>                                         at
>>>> java.io.BufferedInputStream.read(BufferedInputStream.java:265)
>>>>                                         at
>>>> java.io.DataInputStream.readInt(DataInputStream.java:387)
>>>>                                         at
>>>> org.apache.jute.BinaryInputArchive.readInt(BinaryInputArchive.java:63)
>>>>                                         at
>>>> org.apache.zookeeper.server.quorum.QuorumPacket.deserialize(QuorumPacket.java:83)
>>>>                                         at
>>>> org.apache.jute.BinaryInputArchive.readRecord(BinaryInputArchive.java:99)
>>>>                                         at
>>>> org.apache.zookeeper.server.quorum.Learner.readPacket(Learner.java:153)
>>>>                                         at
>>>> org.apache.zookeeper.server.quorum.Follower.followLeader(Follower.java:85)
>>>>                                         at
>>>> org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:937)
>>>> May 18 15:34:28 MD001076 java[29148]: [myid:1] INFO
>>>> [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory@192] -
>>>> Accepted socket connectio
>>>> n from /10.249.255.10:42306
>>>> May 18 15:34:28 MD001076 java[29148]: [myid:1] WARN
>>>> [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:ZooKeeperServer@896] -
>>>> Connection request from old cl
>>>> ient /10.249.255.10:42306; will be dropped if server is in r-o mode
>>>> May 18 15:34:28 MD001076 java[29148]: [myid:1] INFO
>>>> [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:ZooKeeperServer@942] -
>>>> Client attempting to establish
>>>>  new session at /10.249.255.10:42306
>>>> May 18 15:34:28 MD001076 java[29148]: [myid:1] ERROR
>>>> [FollowerRequestProcessor:1:ZooKeeperCriticalThread@49] - Severe
>>>> unrecoverable error, from threa
>>>> d : FollowerRequestProcessor:1
>>>>                                       java.net.SocketException: Socket
>>>> closed
>>>>                                         at
>>>> java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:118)
>>>>                                         at
>>>> java.net.SocketOutputStream.write(SocketOutputStream.java:155)
>>>>                                         at
>>>> java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:82)
>>>>                                         at
>>>> java.io.BufferedOutputStream.flush(BufferedOutputStream.java:140)
>>>>                                         at
>>>> org.apache.zookeeper.server.quorum.Learner.writePacket(Learner.java:139)
>>>>                                         at
>>>> org.apache.zookeeper.server.quorum.Learner.request(Learner.java:188)
>>>>                                         at
>>>> org.apache.zookeeper.server.quorum.FollowerRequestProcessor.run(FollowerRequestProcessor.java:90)
>>>> May 18 15:34:28 MD001076 java[29148]: [myid:1] INFO
>>>> [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2181:Follower@166] - shutdown
>>>> called
>>>>                                       java.lang.Exception: shutdown
>>>> Follower
>>>>                                         at
>>>> org.apache.zookeeper.server.quorum.Follower.shutdown(Follower.java:166)
>>>>                                         at
>>>> org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:941)
>>>
>>>
>>> I am confused why all followers shutdown in this case which makes the
>>> whole ZooKeeper unusable for a short period, shouldn't they elect a new
>>> leader instead? Thanks!
>>>
>>>
>>> Regards,
>>> Qian Zhang
>>>
>>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message