zookeeper-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Amarjeet Singh (JIRA)" <j...@apache.org>
Subject [jira] [Created] (ZOOKEEPER-2836) QuorumCnxManager.Listener Thread Better handling of SocketTimeoutException
Date Tue, 04 Jul 2017 11:51:00 GMT
Amarjeet Singh created ZOOKEEPER-2836:

             Summary: QuorumCnxManager.Listener Thread Better handling of SocketTimeoutException
                 Key: ZOOKEEPER-2836
                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2836
             Project: ZooKeeper
          Issue Type: Bug
          Components: leaderElection, quorum
    Affects Versions: 3.4.6
         Environment: Machine: Linux 3.2.0-4-amd64 #1 SMP Debian 3.2.78-1 x86_64 GNU/Linux
Java Version: jdk64/jdk1.8.0_40
zookeeper version: 
            Reporter: Amarjeet Singh
            Priority: Critical

QuorumCnxManager Listener thread blocks SocketServer on accept but we are getting SocketTimeoutException
 on our boxes after 49days 17 hours . As per current code there is a 3 times retry and after
that it says "_As I'm leaving the listener thread, I won't be able to participate in leader
election any longer: $<hostname>/$<ip>:3888__" , Once server nodes reache this
state and we restart or add a new node ,it fails to join cluster and logs 'WARN  QuorumPeer<myid=1>/0:0:0:0:0:0:0:0:2181:QuorumCnxManager@383
- Cannot open channel to 3 at election address $<hostname>/$<ip>:3888' .

        As there is no timeout specified for ServerSocket it should never timeout but there
are some already discussed issues where people have seen this issue and added checks for SocketTimeoutException
explicitly like https://issues.apache.org/jira/browse/KARAF-3325 . 

        I think we need to handle SocketTimeoutException on similar lines for zookeeper as

This message was sent by Atlassian JIRA

View raw message