zookeeper-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Patrick Hunt <ph...@apache.org>
Subject Re: Why does ZooKeeper follower shutdown itself when it can not read from leader
Date Thu, 23 May 2019 04:18:49 GMT
That was/is the original intent.  ZK was built to "fail fast" when it
didn't know how to handle a particular case, or that case might be error
prone to handle. The expectation is that the parent will restart the ZK
server process when it fails.

Patrick

On Wed, May 22, 2019 at 6:27 PM Qian Zhang <zhq527725@gmail.com> wrote:

> Hi Andor,
>
> I am using ZooKeeper release 3.4.10.
>
> I checked the code, if follower fails to read from leader (e.g., read
> timeout), it will close the socket, see
>
> https://github.com/apache/zookeeper/blob/release-3.4.10/src/java/main/org/apache/zookeeper/server/quorum/Follower.java#L91:L85
> for
> details. And once the socket is close, it will make follower fails to write
> (I guess same socket is used here) which will be treated as an severe
> unrecoverable error, and then shutdown follower, see
>
> https://github.com/apache/zookeeper/blob/release-3.4.10/src/java/main/org/apache/zookeeper/server/quorum/FollowerRequestProcessor.java#L90:L95
>  and
>
> https://github.com/apache/zookeeper/blob/release-3.4.10/src/java/main/org/apache/zookeeper/server/ZooKeeperCriticalThread.java#L48:L51
> .
>
> So it seems shutting down follower when it cannot read from leader is the
> design behavior? Or if my understanding is wrong can you please let me know
> the design behavior in this case? Thanks!
>
>
> Regards,
> Qian Zhang
>
>
> On Wed, May 22, 2019 at 8:52 AM Qian Zhang <zhq527725@gmail.com> wrote:
>
> > Anyone has any ideas?
> >
> > Regards,
> > Qian Zhang
> >
> >
> > On Sun, May 19, 2019 at 6:15 PM Qian Zhang <zhq527725@gmail.com> wrote:
> >
> >> Hi,
> >>
> >> I have a ZooKeeper cluster which has 5 nodes. Today the leader cannot be
> >> connected due to a hardware issue, and then I found the 4 followers just
> >> shutdown, here is the logs:
> >>
> >>> May 18 15:34:28 MD001076 java[29148]: [myid:1] WARN
> >>> [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2181:Follower@89] - Exception when
> >>> following the leader
> >>>                                       java.net.SocketTimeoutException:
> >>> Read timed out
> >>>                                         at
> >>> java.net.SocketInputStream.socketRead0(Native Method)
> >>>                                         at
> >>> java.net.SocketInputStream.socketRead(SocketInputStream.java:116)
> >>>                                         at
> >>> java.net.SocketInputStream.read(SocketInputStream.java:171)
> >>>                                         at
> >>> java.net.SocketInputStream.read(SocketInputStream.java:141)
> >>>                                         at
> >>> java.io.BufferedInputStream.fill(BufferedInputStream.java:246)
> >>>                                         at
> >>> java.io.BufferedInputStream.read(BufferedInputStream.java:265)
> >>>                                         at
> >>> java.io.DataInputStream.readInt(DataInputStream.java:387)
> >>>                                         at
> >>> org.apache.jute.BinaryInputArchive.readInt(BinaryInputArchive.java:63)
> >>>                                         at
> >>>
> org.apache.zookeeper.server.quorum.QuorumPacket.deserialize(QuorumPacket.java:83)
> >>>                                         at
> >>>
> org.apache.jute.BinaryInputArchive.readRecord(BinaryInputArchive.java:99)
> >>>                                         at
> >>> org.apache.zookeeper.server.quorum.Learner.readPacket(Learner.java:153)
> >>>                                         at
> >>>
> org.apache.zookeeper.server.quorum.Follower.followLeader(Follower.java:85)
> >>>                                         at
> >>> org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:937)
> >>> May 18 15:34:28 MD001076 java[29148]: [myid:1] INFO
> >>> [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory@192] -
> >>> Accepted socket connectio
> >>> n from /10.249.255.10:42306
> >>> May 18 15:34:28 MD001076 java[29148]: [myid:1] WARN
> >>> [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:ZooKeeperServer@896] -
> >>> Connection request from old cl
> >>> ient /10.249.255.10:42306; will be dropped if server is in r-o mode
> >>> May 18 15:34:28 MD001076 java[29148]: [myid:1] INFO
> >>> [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:ZooKeeperServer@942] -
> >>> Client attempting to establish
> >>>  new session at /10.249.255.10:42306
> >>> May 18 15:34:28 MD001076 java[29148]: [myid:1] ERROR
> >>> [FollowerRequestProcessor:1:ZooKeeperCriticalThread@49] - Severe
> >>> unrecoverable error, from threa
> >>> d : FollowerRequestProcessor:1
> >>>                                       java.net.SocketException: Socket
> >>> closed
> >>>                                         at
> >>> java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:118)
> >>>                                         at
> >>> java.net.SocketOutputStream.write(SocketOutputStream.java:155)
> >>>                                         at
> >>> java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:82)
> >>>                                         at
> >>> java.io.BufferedOutputStream.flush(BufferedOutputStream.java:140)
> >>>                                         at
> >>>
> org.apache.zookeeper.server.quorum.Learner.writePacket(Learner.java:139)
> >>>                                         at
> >>> org.apache.zookeeper.server.quorum.Learner.request(Learner.java:188)
> >>>                                         at
> >>>
> org.apache.zookeeper.server.quorum.FollowerRequestProcessor.run(FollowerRequestProcessor.java:90)
> >>> May 18 15:34:28 MD001076 java[29148]: [myid:1] INFO
> >>> [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2181:Follower@166] - shutdown
> called
> >>>                                       java.lang.Exception: shutdown
> >>> Follower
> >>>                                         at
> >>> org.apache.zookeeper.server.quorum.Follower.shutdown(Follower.java:166)
> >>>                                         at
> >>> org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:941)
> >>
> >>
> >> I am confused why all followers shutdown in this case which makes the
> >> whole ZooKeeper unusable for a short period, shouldn't they elect a new
> >> leader instead? Thanks!
> >>
> >>
> >> Regards,
> >> Qian Zhang
> >>
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message