zookeeper-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Abraham Fine <af...@apache.org>
Subject Re: intermitent failures led to node restart
Date Mon, 17 Jul 2017 19:40:38 GMT
Hello-

It looks like there is a timeout during a ping. This is strange because
pings don't need to transfer much data. 

Does the issue always occur with the same learner? Is it possible that
there is a network issue here?

You also may want to consider playing with the syncLimit.

Thanks,
Abe

On Mon, Jul 17, 2017, at 01:30, mostolog@gmail.com wrote:
> ping!
> 
> 
> On 14/07/17 13:27, mostolog@gmail.com wrote:
> >
> > Hi
> >
> > Using 3.5.3-beta, from time to time, our zookeeper ensemble leader 
> > commits suicide:
> >
> >     Jul 13 12:40:32 host zookeeper[28511]: [2017-07-13 12:40:32,797]
> >     ERROR Unexpected exception causing shutdown while sock still open
> >     (org.apache.zookeeper.server.quorum.LearnerHandler)
> >     Jul 13 12:40:32 host zookeeper[28511]:
> >     java.net.SocketTimeoutException: Read timed out
> >     Jul 13 12:40:32 host zookeeper[28511]: #011at
> >     java.net.SocketInputStream.socketRead0(Native Method)
> >     Jul 13 12:40:32 host zookeeper[28511]: #011at
> >     java.net.SocketInputStream.socketRead(SocketInputStream.java:116)
> >     Jul 13 12:40:32 host zookeeper[28511]: #011at
> >     java.net.SocketInputStream.read(SocketInputStream.java:171)
> >     Jul 13 12:40:32 host zookeeper[28511]: #011at
> >     java.net.SocketInputStream.read(SocketInputStream.java:141)
> >     Jul 13 12:40:32 host zookeeper[28511]: #011at
> >     java.io.BufferedInputStream.fill(BufferedInputStream.java:246)
> >     Jul 13 12:40:32 host zookeeper[28511]: #011at
> >     java.io.BufferedInputStream.read(BufferedInputStream.java:265)
> >     Jul 13 12:40:32 host zookeeper[28511]: #011at
> >     java.io.DataInputStream.readInt(DataInputStream.java:387)
> >     Jul 13 12:40:32 host zookeeper[28511]: #011at
> >     org.apache.jute.BinaryInputArchive.readInt(BinaryInputArchive.java:63)
> >     Jul 13 12:40:32 host zookeeper[28511]: #011at
> >     org.apache.zookeeper.server.quorum.QuorumPacket.deserialize(QuorumPacket.java:83)
> >     Jul 13 12:40:32 host zookeeper[28511]: #011at
> >     org.apache.jute.BinaryInputArchive.readRecord(BinaryInputArchive.java:99)
> >     Jul 13 12:40:32 host zookeeper[28511]: #011at
> >     org.apache.zookeeper.server.quorum.LearnerHandler.run(LearnerHandler.java:515)
> >     Jul 13 12:40:32 host zookeeper[28511]: [2017-07-13 12:40:32,844]
> >     WARN ******* GOODBYE /10.0.0.11:42816 ********
> >     (org.apache.zookeeper.server.quorum.LearnerHandler)
> >
> > Another ensemble node also complains (*cause or effect? notice 
> > miliseconds!*):
> >
> >     Jul 13 12:40:32 host12 zookeeper[26293]: [2017-07-13 12:40:32,500]
> >     WARN Exception when following the leader
> >     (org.apache.zookeeper.server.quorum.Learner)
> >     Jul 13 12:40:32 host12 zookeeper[26293]: java.net.SocketException:
> >     Broken pipe (Write failed)
> >     Jul 13 12:40:32 host12 zookeeper[26293]: #011at
> >     java.net.SocketOutputStream.socketWrite0(Native Method)
> >     Jul 13 12:40:32 host12 zookeeper[26293]: #011at
> >     java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:111)
> >     Jul 13 12:40:32 host12 zookeeper[26293]: #011at
> >     java.net.SocketOutputStream.write(SocketOutputStream.java:155)
> >     Jul 13 12:40:32 host12 zookeeper[26293]: #011at
> >     java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:82)
> >     Jul 13 12:40:32 host12 zookeeper[26293]: #011at
> >     java.io.BufferedOutputStream.flush(BufferedOutputStream.java:140)
> >     Jul 13 12:40:32 host12 zookeeper[26293]: #011at
> >     org.apache.zookeeper.server.quorum.Learner.writePacket(Learner.java:141)
> >     Jul 13 12:40:32 host12 zookeeper[26293]: #011at
> >     org.apache.zookeeper.server.quorum.Learner.ping(Learner.java:620)
> >     Jul 13 12:40:32 host12 zookeeper[26293]: #011at
> >     org.apache.zookeeper.server.quorum.Follower.processPacket(Follower.java:118)
> >     Jul 13 12:40:32 host12 zookeeper[26293]: #011at
> >     org.apache.zookeeper.server.quorum.Follower.followLeader(Follower.java:92)
> >     Jul 13 12:40:32 host12 zookeeper[26293]: #011at
> >     org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:1133)
> >     Jul 13 12:40:32 host12 zookeeper[26293]: [2017-07-13 12:40:32,567]
> >     WARN PeerState set to LOOKING
> >     (org.apache.zookeeper.server.quorum.QuorumPeer)
> >
> > while the third node still continues living happily.
> >
> > Ideas?
> >
> >
> >
> 

Mime
View raw message