zookeeper-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "mostolog@gmail.com" <mosto...@gmail.com>
Subject Re: intermitent failures led to node restart
Date Tue, 18 Jul 2017 08:23:58 GMT
We are using latest docker release, so I'm not thinking on network issues

Cluster, however, handles a lot of traffic, so perhaps node is "too 
busy" to answer within time. Is this possible?

I'll have an eye on syncLimit.

Thanks.

PS: Any other replies are welcome too!


On 17/07/17 21:40, Abraham Fine wrote:
> Hello-
>
> It looks like there is a timeout during a ping. This is strange because
> pings don't need to transfer much data.
>
> Does the issue always occur with the same learner? Is it possible that
> there is a network issue here?
>
> You also may want to consider playing with the syncLimit.
>
> Thanks,
> Abe
>
> On Mon, Jul 17, 2017, at 01:30, mostolog@gmail.com wrote:
>> ping!
>>
>>
>> On 14/07/17 13:27, mostolog@gmail.com wrote:
>>> Hi
>>>
>>> Using 3.5.3-beta, from time to time, our zookeeper ensemble leader
>>> commits suicide:
>>>
>>>      Jul 13 12:40:32 host zookeeper[28511]: [2017-07-13 12:40:32,797]
>>>      ERROR Unexpected exception causing shutdown while sock still open
>>>      (org.apache.zookeeper.server.quorum.LearnerHandler)
>>>      Jul 13 12:40:32 host zookeeper[28511]:
>>>      java.net.SocketTimeoutException: Read timed out
>>>      Jul 13 12:40:32 host zookeeper[28511]: #011at
>>>      java.net.SocketInputStream.socketRead0(Native Method)
>>>      Jul 13 12:40:32 host zookeeper[28511]: #011at
>>>      java.net.SocketInputStream.socketRead(SocketInputStream.java:116)
>>>      Jul 13 12:40:32 host zookeeper[28511]: #011at
>>>      java.net.SocketInputStream.read(SocketInputStream.java:171)
>>>      Jul 13 12:40:32 host zookeeper[28511]: #011at
>>>      java.net.SocketInputStream.read(SocketInputStream.java:141)
>>>      Jul 13 12:40:32 host zookeeper[28511]: #011at
>>>      java.io.BufferedInputStream.fill(BufferedInputStream.java:246)
>>>      Jul 13 12:40:32 host zookeeper[28511]: #011at
>>>      java.io.BufferedInputStream.read(BufferedInputStream.java:265)
>>>      Jul 13 12:40:32 host zookeeper[28511]: #011at
>>>      java.io.DataInputStream.readInt(DataInputStream.java:387)
>>>      Jul 13 12:40:32 host zookeeper[28511]: #011at
>>>      org.apache.jute.BinaryInputArchive.readInt(BinaryInputArchive.java:63)
>>>      Jul 13 12:40:32 host zookeeper[28511]: #011at
>>>      org.apache.zookeeper.server.quorum.QuorumPacket.deserialize(QuorumPacket.java:83)
>>>      Jul 13 12:40:32 host zookeeper[28511]: #011at
>>>      org.apache.jute.BinaryInputArchive.readRecord(BinaryInputArchive.java:99)
>>>      Jul 13 12:40:32 host zookeeper[28511]: #011at
>>>      org.apache.zookeeper.server.quorum.LearnerHandler.run(LearnerHandler.java:515)
>>>      Jul 13 12:40:32 host zookeeper[28511]: [2017-07-13 12:40:32,844]
>>>      WARN ******* GOODBYE /10.0.0.11:42816 ********
>>>      (org.apache.zookeeper.server.quorum.LearnerHandler)
>>>
>>> Another ensemble node also complains (*cause or effect? notice
>>> miliseconds!*):
>>>
>>>      Jul 13 12:40:32 host12 zookeeper[26293]: [2017-07-13 12:40:32,500]
>>>      WARN Exception when following the leader
>>>      (org.apache.zookeeper.server.quorum.Learner)
>>>      Jul 13 12:40:32 host12 zookeeper[26293]: java.net.SocketException:
>>>      Broken pipe (Write failed)
>>>      Jul 13 12:40:32 host12 zookeeper[26293]: #011at
>>>      java.net.SocketOutputStream.socketWrite0(Native Method)
>>>      Jul 13 12:40:32 host12 zookeeper[26293]: #011at
>>>      java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:111)
>>>      Jul 13 12:40:32 host12 zookeeper[26293]: #011at
>>>      java.net.SocketOutputStream.write(SocketOutputStream.java:155)
>>>      Jul 13 12:40:32 host12 zookeeper[26293]: #011at
>>>      java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:82)
>>>      Jul 13 12:40:32 host12 zookeeper[26293]: #011at
>>>      java.io.BufferedOutputStream.flush(BufferedOutputStream.java:140)
>>>      Jul 13 12:40:32 host12 zookeeper[26293]: #011at
>>>      org.apache.zookeeper.server.quorum.Learner.writePacket(Learner.java:141)
>>>      Jul 13 12:40:32 host12 zookeeper[26293]: #011at
>>>      org.apache.zookeeper.server.quorum.Learner.ping(Learner.java:620)
>>>      Jul 13 12:40:32 host12 zookeeper[26293]: #011at
>>>      org.apache.zookeeper.server.quorum.Follower.processPacket(Follower.java:118)
>>>      Jul 13 12:40:32 host12 zookeeper[26293]: #011at
>>>      org.apache.zookeeper.server.quorum.Follower.followLeader(Follower.java:92)
>>>      Jul 13 12:40:32 host12 zookeeper[26293]: #011at
>>>      org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:1133)
>>>      Jul 13 12:40:32 host12 zookeeper[26293]: [2017-07-13 12:40:32,567]
>>>      WARN PeerState set to LOOKING
>>>      (org.apache.zookeeper.server.quorum.QuorumPeer)
>>>
>>> while the third node still continues living happily.
>>>
>>> Ideas?
>>>
>>>
>>>


Mime
View raw message