zookeeper-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "ASF GitHub Bot (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (ZOOKEEPER-2836) QuorumCnxManager.Listener Thread Better handling of SocketTimeoutException
Date Fri, 18 Aug 2017 01:44:00 GMT

    [ https://issues.apache.org/jira/browse/ZOOKEEPER-2836?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16131588#comment-16131588

ASF GitHub Bot commented on ZOOKEEPER-2836:

Github user maoling commented on a diff in the pull request:

    --- Diff: src/java/main/org/apache/zookeeper/server/quorum/QuorumCnxManager.java ---
    @@ -647,11 +648,10 @@ public void run() {
                             numRetries = 0;
                     } catch (IOException e) {
    -                    if (shutdown) {
    -                        break;
    -                    }
                         LOG.error("Exception while listening", e);
    -                    numRetries++;
    +                    if (!(e instanceof SocketTimeoutException)) {
    --- End diff --
    -  can we reproduce this issue?(haha,49days)? This should never happen theoretically.According
to [KARAF-3325](https://issues.apache.org/jira/browse/KARAF-3325) or [tomcat-56684](https://bz.apache.org/bugzilla/show_bug.cgi?id=56684),they
also  didn't find the root-cause,just do like [this](https://github.com/apache/karaf/pull/50/commits/0349d582c4899f19ad73ee37c8c688660cbc7354)
to add some protections against this issue here.
    -  One assumption is SocketServer.accept() use the default infinite value(2 ^ 32 -1=4294967295)
without no timeout specified or setSoTimeout(0) 
        > a call to accept() for this ServerSocket will block for only this amount of time.
If the timeout expires, a java.net.SocketTimeoutException is raised, though the ServerSocket
is still valid. The option must be enabled prior to entering the blocking operation to have
effect. The timeout must be > 0. A timeout of zero is interpreted as an infinite timeout.
       so this issuse always happended after 49days 17 hours(4294967295/1000/60/60/24=49.7days)

> QuorumCnxManager.Listener Thread Better handling of SocketTimeoutException
> --------------------------------------------------------------------------
>                 Key: ZOOKEEPER-2836
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2836
>             Project: ZooKeeper
>          Issue Type: Bug
>          Components: leaderElection, quorum
>    Affects Versions: 3.4.6
>         Environment: Machine: Linux 3.2.0-4-amd64 #1 SMP Debian 3.2.78-1 x86_64 GNU/Linux
> Java Version: jdk64/jdk1.8.0_40
> zookeeper version: 
>            Reporter: Amarjeet Singh
>            Priority: Critical
> QuorumCnxManager Listener thread blocks SocketServer on accept but we are getting SocketTimeoutException
 on our boxes after 49days 17 hours . As per current code there is a 3 times retry and after
that it says "_As I'm leaving the listener thread, I won't be able to participate in leader
election any longer: $<hostname>/$<ip>:3888__" , Once server nodes reache this
state and we restart or add a new node ,it fails to join cluster and logs 'WARN  QuorumPeer<myid=1>/0:0:0:0:0:0:0:0:2181:QuorumCnxManager@383
- Cannot open channel to 3 at election address $<hostname>/$<ip>:3888' .
>         As there is no timeout specified for ServerSocket it should never timeout but
there are some already discussed issues where people have seen this issue and added checks
for SocketTimeoutException explicitly like https://issues.apache.org/jira/browse/KARAF-3325
>         I think we need to handle SocketTimeoutException on similar lines for zookeeper
as well 

This message was sent by Atlassian JIRA

View raw message