Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id 6ADB9200B51 for ; Mon, 1 Aug 2016 09:12:22 +0200 (CEST) Received: by cust-asf.ponee.io (Postfix) id 699AD160A5D; Mon, 1 Aug 2016 07:12:22 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id BC78E160AA7 for ; Mon, 1 Aug 2016 09:12:21 +0200 (CEST) Received: (qmail 86476 invoked by uid 500); 1 Aug 2016 07:12:21 -0000 Mailing-List: contact issues-help@ignite.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@ignite.apache.org Delivered-To: mailing list issues@ignite.apache.org Received: (qmail 86451 invoked by uid 99); 1 Aug 2016 07:12:20 -0000 Received: from arcas.apache.org (HELO arcas) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 01 Aug 2016 07:12:20 +0000 Received: from arcas.apache.org (localhost [127.0.0.1]) by arcas (Postfix) with ESMTP id C02B72C027F for ; Mon, 1 Aug 2016 07:12:20 +0000 (UTC) Date: Mon, 1 Aug 2016 07:12:20 +0000 (UTC) From: "Yakov Zhdanov (JIRA)" To: issues@ignite.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Updated] (IGNITE-3606) Node sometimes fails to detect broken connection MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 archived-at: Mon, 01 Aug 2016 07:12:22 -0000 [ https://issues.apache.org/jira/browse/IGNITE-3606?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yakov Zhdanov updated IGNITE-3606: ---------------------------------- Description: Here is test reproducing issue https://github.com/rossdanderson/IgniteDeadlock. When I run this test observe this sequence: - server starts - client starts - server sends 2000 messages to client, on client node communication backpressure pauses reads - server gets write timeout and closes socket - for some reason client does not detect that existing connection was broken and thinks that connection is still established (most probably because reads are paused and node does not try to access connection) - when server tries to re-connec, client sees that connection already established and rejects connection, so server constantly tries to reconnect and does not exit from reconnect loop: {noformat} "main" prio=6 tid=0x0000000001f4a000 nid=0x3588 waiting on condition [0x00000000021ed000] java.lang.Thread.State: TIMED_WAITING (sleeping) at java.lang.Thread.sleep(Native Method) at org.apache.ignite.internal.util.IgniteUtils.sleep(IgniteUtils.java:7414) at org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.reserveClient(TcpCommunicationSpi.java:2055) at org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.sendMessage0(TcpCommunicationSpi.java:1970) at org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.sendMessage(TcpCommunicationSpi.java:1936) at org.apache.ignite.internal.managers.communication.GridIoManager.send(GridIoManager.java:1304) at org.apache.ignite.internal.managers.communication.GridIoManager.sendOrderedMessage(GridIoManager.java:1540) {noformat} was: Here is test reproducing issue https://github.com/rossdanderson/IgniteDeadlock. When I run this test observe this sequence: - server starts - client starts - server sends 2000 messages to client, on client node communication backpressure pauses reads - server gets write timeout and closes socket - for some reason client does not detect that existing connection was broken and thinks that connection is still established (most probably because reads are paused and node does not try to access connection) - when server tries to re-connect then client sees that connection already established and rejects connection, so server constantly tries to reconnect and does not exist from reconnect loop: {noformat} "main" prio=6 tid=0x0000000001f4a000 nid=0x3588 waiting on condition [0x00000000021ed000] java.lang.Thread.State: TIMED_WAITING (sleeping) at java.lang.Thread.sleep(Native Method) at org.apache.ignite.internal.util.IgniteUtils.sleep(IgniteUtils.java:7414) at org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.reserveClient(TcpCommunicationSpi.java:2055) at org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.sendMessage0(TcpCommunicationSpi.java:1970) at org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.sendMessage(TcpCommunicationSpi.java:1936) at org.apache.ignite.internal.managers.communication.GridIoManager.send(GridIoManager.java:1304) at org.apache.ignite.internal.managers.communication.GridIoManager.sendOrderedMessage(GridIoManager.java:1540) {noformat} > Node sometimes fails to detect broken connection > ------------------------------------------------ > > Key: IGNITE-3606 > URL: https://issues.apache.org/jira/browse/IGNITE-3606 > Project: Ignite > Issue Type: Bug > Components: general > Reporter: Semen Boikov > Priority: Critical > Fix For: 1.8 > > > Here is test reproducing issue https://github.com/rossdanderson/IgniteDeadlock. > When I run this test observe this sequence: > - server starts > - client starts > - server sends 2000 messages to client, on client node communication backpressure pauses reads > - server gets write timeout and closes socket > - for some reason client does not detect that existing connection was broken and thinks that connection is still established (most probably because reads are paused and node does not try to access connection) > - when server tries to re-connec, client sees that connection already established and rejects connection, so server constantly tries to reconnect and does not exit from reconnect loop: > {noformat} > "main" prio=6 tid=0x0000000001f4a000 nid=0x3588 waiting on condition [0x00000000021ed000] > java.lang.Thread.State: TIMED_WAITING (sleeping) > at java.lang.Thread.sleep(Native Method) > at org.apache.ignite.internal.util.IgniteUtils.sleep(IgniteUtils.java:7414) > at org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.reserveClient(TcpCommunicationSpi.java:2055) > at org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.sendMessage0(TcpCommunicationSpi.java:1970) > at org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.sendMessage(TcpCommunicationSpi.java:1936) > at org.apache.ignite.internal.managers.communication.GridIoManager.send(GridIoManager.java:1304) > at org.apache.ignite.internal.managers.communication.GridIoManager.sendOrderedMessage(GridIoManager.java:1540) > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)