zookeeper-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Aishwarya Soni (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (ZOOKEEPER-3036) Unexpected exception in zookeeper
Date Thu, 11 Oct 2018 03:42:00 GMT

    [ https://issues.apache.org/jira/browse/ZOOKEEPER-3036?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16645913#comment-16645913
] 

Aishwarya Soni edited comment on ZOOKEEPER-3036 at 10/11/18 3:41 AM:
---------------------------------------------------------------------

We got the same issue a couple of days back. We are running zookeeper in a containerized AWS
environment and we had to restart the problem container to get the above issue resolved. The
issue comes when the port binding doesn't happen. When the container becomes unhealthy, it
doesn't release the port and when it tries to bind to that port to join the quorum, as the
port was already in use and never released, it throws the exception of *Unexpected exception
causing shutdown while sock still open*

This is where the binding happens, QuorumCnxManager class in zookeeper,
*ss.socket().bind(new InetSocketAddress(port));*

In LearnerHandler.java class, it tries to access the port and as the port is still being used,
it throws the exception**

*if (sock != null && !sock.isClosed()) {LOG.error("Unexpected exception causing shutdown
while sock "+ "still open", e);*

Most of the cases, the port might not be null.


was (Author: ashishsoni1991@yahoo.co.in):
We got the same issue a couple of days back. We are running zookeeper in a containerized AWS
environment and we had to restart the problem container to get the above issue resolved. The
issue comes when the port binding doesn't happen. When the container becomes unhealthy, it
doesn't release the port and when it tries to bind to that port to join the quorum, as the
port was already in use and never released, it throws the exception of *Unexpected exception
causing shutdown while sock still open*

> Unexpected exception in zookeeper
> ---------------------------------
>
>                 Key: ZOOKEEPER-3036
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3036
>             Project: ZooKeeper
>          Issue Type: Bug
>          Components: quorum, server
>    Affects Versions: 3.4.10
>         Environment: 3 Zookeepers, 5 kafka servers
>            Reporter: Oded
>            Priority: Critical
>
> We got an issue with one of the zookeeprs (Leader), causing the entire kafka cluster
to fail:
> 2018-05-09 02:29:01,730 [myid:3] - ERROR [LearnerHandler-/192.168.0.91:42490:LearnerHandler@648]
- Unexpected exception causing shutdown while sock still open
> java.net.SocketTimeoutException: Read timed out
>         at java.net.SocketInputStream.socketRead0(Native Method)
>         at java.net.SocketInputStream.socketRead(SocketInputStream.java:116)
>         at java.net.SocketInputStream.read(SocketInputStream.java:171)
>         at java.net.SocketInputStream.read(SocketInputStream.java:141)
>         at java.io.BufferedInputStream.fill(BufferedInputStream.java:246)
>         at java.io.BufferedInputStream.read(BufferedInputStream.java:265)
>         at java.io.DataInputStream.readInt(DataInputStream.java:387)
>         at org.apache.jute.BinaryInputArchive.readInt(BinaryInputArchive.java:63)
>         at org.apache.zookeeper.server.quorum.QuorumPacket.deserialize(QuorumPacket.java:83)
>         at org.apache.jute.BinaryInputArchive.readRecord(BinaryInputArchive.java:99)
>         at org.apache.zookeeper.server.quorum.LearnerHandler.run(LearnerHandler.java:559)
> 2018-05-09 02:29:01,730 [myid:3] - WARN  [LearnerHandler-/192.168.0.91:42490:LearnerHandler@661]
- ******* GOODBYE /192.168.0.91:42490 ********
>  
> We would expect that zookeeper will choose another Leader and the Kafka cluster will
continue to work as expected, but that was not the case.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Mime
View raw message