Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id 601FB2004A1 for ; Thu, 24 Aug 2017 18:42:06 +0200 (CEST) Received: by cust-asf.ponee.io (Postfix) id 5E82416AF15; Thu, 24 Aug 2017 16:42:06 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id A566F16AEF6 for ; Thu, 24 Aug 2017 18:42:05 +0200 (CEST) Received: (qmail 91124 invoked by uid 500); 24 Aug 2017 16:42:03 -0000 Mailing-List: contact dev-help@zookeeper.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@zookeeper.apache.org Delivered-To: mailing list dev@zookeeper.apache.org Received: (qmail 91110 invoked by uid 99); 24 Aug 2017 16:42:03 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd2-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 24 Aug 2017 16:42:03 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd2-us-west.apache.org (ASF Mail Server at spamd2-us-west.apache.org) with ESMTP id EC0DF1A181A for ; Thu, 24 Aug 2017 16:42:02 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd2-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: -99.202 X-Spam-Level: X-Spam-Status: No, score=-99.202 tagged_above=-999 required=6.31 tests=[KAM_ASCII_DIVIDERS=0.8, RP_MATCHES_RCVD=-0.001, SPF_PASS=-0.001, USER_IN_WHITELIST=-100] autolearn=disabled Received: from mx1-lw-us.apache.org ([10.40.0.8]) by localhost (spamd2-us-west.apache.org [10.40.0.9]) (amavisd-new, port 10024) with ESMTP id rV7zn04Utdei for ; Thu, 24 Aug 2017 16:42:02 +0000 (UTC) Received: from mailrelay1-us-west.apache.org (mailrelay1-us-west.apache.org [209.188.14.139]) by mx1-lw-us.apache.org (ASF Mail Server at mx1-lw-us.apache.org) with ESMTP id EFA8F61126 for ; Thu, 24 Aug 2017 16:42:01 +0000 (UTC) Received: from jira-lw-us.apache.org (unknown [207.244.88.139]) by mailrelay1-us-west.apache.org (ASF Mail Server at mailrelay1-us-west.apache.org) with ESMTP id 49101E0DE0 for ; Thu, 24 Aug 2017 16:42:01 +0000 (UTC) Received: from jira-lw-us.apache.org (localhost [127.0.0.1]) by jira-lw-us.apache.org (ASF Mail Server at jira-lw-us.apache.org) with ESMTP id 93AF825389 for ; Thu, 24 Aug 2017 16:42:00 +0000 (UTC) Date: Thu, 24 Aug 2017 16:42:00 +0000 (UTC) From: "Michael Han (JIRA)" To: dev@zookeeper.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Assigned] (ZOOKEEPER-2836) QuorumCnxManager.Listener Thread Better handling of SocketTimeoutException MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 archived-at: Thu, 24 Aug 2017 16:42:06 -0000 [ https://issues.apache.org/jira/browse/ZOOKEEPER-2836?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Han reassigned ZOOKEEPER-2836: -------------------------------------- Assignee: gaoshu > QuorumCnxManager.Listener Thread Better handling of SocketTimeoutException > -------------------------------------------------------------------------- > > Key: ZOOKEEPER-2836 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2836 > Project: ZooKeeper > Issue Type: Bug > Components: leaderElection, quorum > Affects Versions: 3.4.6 > Environment: Machine: Linux 3.2.0-4-amd64 #1 SMP Debian 3.2.78-1 x86_64 GNU/Linux > Java Version: jdk64/jdk1.8.0_40 > zookeeper version: 3.4.6.2.3.2.0-2950 > Reporter: Amarjeet Singh > Assignee: gaoshu > Priority: Critical > > QuorumCnxManager Listener thread blocks SocketServer on accept but we are getting SocketTimeoutException on our boxes after 49days 17 hours . As per current code there is a 3 times retry and after that it says "_As I'm leaving the listener thread, I won't be able to participate in leader election any longer: $/$:3888__" , Once server nodes reache this state and we restart or add a new node ,it fails to join cluster and logs 'WARN QuorumPeer/0:0:0:0:0:0:0:0:2181:QuorumCnxManager@383 - Cannot open channel to 3 at election address $/$:3888' . > As there is no timeout specified for ServerSocket it should never timeout but there are some already discussed issues where people have seen this issue and added checks for SocketTimeoutException explicitly like https://issues.apache.org/jira/browse/KARAF-3325 . > I think we need to handle SocketTimeoutException on similar lines for zookeeper as well -- This message was sent by Atlassian JIRA (v6.4.14#64029)