zookeeper-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Oded (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (ZOOKEEPER-3036) Unexpected exception in zookeeper
Date Fri, 27 Jul 2018 06:30:01 GMT

    [ https://issues.apache.org/jira/browse/ZOOKEEPER-3036?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16559328#comment-16559328
] 

Oded commented on ZOOKEEPER-3036:
---------------------------------

Hi,

We did the same but the issue returned again. It happens to us from time to
time, and we handle it manually.
The main problem is that it caused kafka for "split brain" where the
cluster believes it has more then one controller.





> Unexpected exception in zookeeper
> ---------------------------------
>
>                 Key: ZOOKEEPER-3036
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3036
>             Project: ZooKeeper
>          Issue Type: Bug
>          Components: jmx
>    Affects Versions: 3.4.10
>         Environment: 3 Zookeepers, 5 kafka servers
>            Reporter: Oded
>            Priority: Critical
>
> We got an issue with one of the zookeeprs (Leader), causing the entire kafka cluster
to fail:
> 2018-05-09 02:29:01,730 [myid:3] - ERROR [LearnerHandler-/192.168.0.91:42490:LearnerHandler@648]
- Unexpected exception causing shutdown while sock still open
> java.net.SocketTimeoutException: Read timed out
>         at java.net.SocketInputStream.socketRead0(Native Method)
>         at java.net.SocketInputStream.socketRead(SocketInputStream.java:116)
>         at java.net.SocketInputStream.read(SocketInputStream.java:171)
>         at java.net.SocketInputStream.read(SocketInputStream.java:141)
>         at java.io.BufferedInputStream.fill(BufferedInputStream.java:246)
>         at java.io.BufferedInputStream.read(BufferedInputStream.java:265)
>         at java.io.DataInputStream.readInt(DataInputStream.java:387)
>         at org.apache.jute.BinaryInputArchive.readInt(BinaryInputArchive.java:63)
>         at org.apache.zookeeper.server.quorum.QuorumPacket.deserialize(QuorumPacket.java:83)
>         at org.apache.jute.BinaryInputArchive.readRecord(BinaryInputArchive.java:99)
>         at org.apache.zookeeper.server.quorum.LearnerHandler.run(LearnerHandler.java:559)
> 2018-05-09 02:29:01,730 [myid:3] - WARN  [LearnerHandler-/192.168.0.91:42490:LearnerHandler@661]
- ******* GOODBYE /192.168.0.91:42490 ********
>  
> We would expect that zookeeper will choose another Leader and the Kafka cluster will
continue to work as expected, but that was not the case.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Mime
View raw message