Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id 30148200D3C for ; Tue, 14 Nov 2017 13:27:07 +0100 (CET) Received: by cust-asf.ponee.io (Postfix) id 2E208160BF4; Tue, 14 Nov 2017 12:27:07 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id 4CFA31609EF for ; Tue, 14 Nov 2017 13:27:06 +0100 (CET) Received: (qmail 60120 invoked by uid 500); 14 Nov 2017 12:27:05 -0000 Mailing-List: contact issues-help@ignite.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@ignite.apache.org Delivered-To: mailing list issues@ignite.apache.org Received: (qmail 60111 invoked by uid 99); 14 Nov 2017 12:27:05 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd3-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 14 Nov 2017 12:27:05 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd3-us-west.apache.org (ASF Mail Server at spamd3-us-west.apache.org) with ESMTP id 981F5183525 for ; Tue, 14 Nov 2017 12:27:04 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd3-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: -99.201 X-Spam-Level: X-Spam-Status: No, score=-99.201 tagged_above=-999 required=6.31 tests=[KAM_ASCII_DIVIDERS=0.8, RP_MATCHES_RCVD=-0.001, SPF_PASS=-0.001, URIBL_BLOCKED=0.001, USER_IN_WHITELIST=-100] autolearn=disabled Received: from mx1-lw-eu.apache.org ([10.40.0.8]) by localhost (spamd3-us-west.apache.org [10.40.0.10]) (amavisd-new, port 10024) with ESMTP id vUp5W_RsqQUc for ; Tue, 14 Nov 2017 12:27:02 +0000 (UTC) Received: from mailrelay1-us-west.apache.org (mailrelay1-us-west.apache.org [209.188.14.139]) by mx1-lw-eu.apache.org (ASF Mail Server at mx1-lw-eu.apache.org) with ESMTP id B1E835F3DE for ; Tue, 14 Nov 2017 12:27:01 +0000 (UTC) Received: from jira-lw-us.apache.org (unknown [207.244.88.139]) by mailrelay1-us-west.apache.org (ASF Mail Server at mailrelay1-us-west.apache.org) with ESMTP id D728CE031E for ; Tue, 14 Nov 2017 12:27:00 +0000 (UTC) Received: from jira-lw-us.apache.org (localhost [127.0.0.1]) by jira-lw-us.apache.org (ASF Mail Server at jira-lw-us.apache.org) with ESMTP id 5798A23F05 for ; Tue, 14 Nov 2017 12:27:00 +0000 (UTC) Date: Tue, 14 Nov 2017 12:27:00 +0000 (UTC) From: "ASF GitHub Bot (JIRA)" To: issues@ignite.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (IGNITE-6071) Client may detect necessity for reconnect for too long MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 archived-at: Tue, 14 Nov 2017 12:27:07 -0000 [ https://issues.apache.org/jira/browse/IGNITE-6071?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16251309#comment-16251309 ] ASF GitHub Bot commented on IGNITE-6071: ---------------------------------------- Github user alamar closed the pull request at: https://github.com/apache/ignite/pull/2903 > Client may detect necessity for reconnect for too long > ------------------------------------------------------ > > Key: IGNITE-6071 > URL: https://issues.apache.org/jira/browse/IGNITE-6071 > Project: Ignite > Issue Type: Bug > Affects Versions: 2.1 > Reporter: Yakov Zhdanov > Assignee: Ilya Kasnacheev > > There was a GC pause on client that caused servers to drop client due to inability to establish TCP communication connection. Then it took some time for client to detect that it has been dropped. During that time client many times attempted to connect to server which can be seen in the logs. After client detected its drop and reconnected servers fired node added event and no log flood can be found any more. > We need to find out why client was reconnecting via communication and did not detect the drop for such a long time. > I hope this can be reproduced in test: > * start 2 servers > * start client > * suspend all client threads with Thread.suspend() - just filter threads of current JVM by name and suspend ones belonging to the client. > {noformat} > [10:12:24,785][WARNING][disco-event-worker-#71%null%][GridDiscoveryManager] Node FAILED: TcpDiscoveryNode [id=dd71479c-41ba-443e-b25c-3803a2a94f4f, addrs=[10.44.3.14, 127.0.0.1], sockAddrs=[/127.0.0.1:0, XXX.com/10.44.3.14:0], discPort=0, order=2, intOrder=2, lastExchangeTime=1502269008673, loc=false, ver=2.1.1#20170618-sha1:09ce29e0, isClient=true] > [10:12:24,785][INFO][disco-event-worker-#71%null%][GridDiscoveryManager] Topology snapshot [ver=5, servers=2, clients=1, CPUs=144, heap=76.0GB] > [10:12:24,794][INFO][exchange-worker-#72%null%][time] Started exchange init [topVer=AffinityTopologyVersion [topVer=5, minorTopVer=0], crd=false, evt=12, node=TcpDiscoveryNode [id=98c1fdf7-09db-4fa0-bb01-8ca7f046643d, addrs=[10.44.3.11, 127.0.0.1], sockAddrs=[/127.0.0.1:47500, XXX.com/10.44.3.11:47500], discPort=47500, order=3, intOrder=3, lastExchangeTime=1502269944782, loc=true, ver=2.1.1#20170618-sha1:09ce29e0, isClient=false], evtNode=TcpDiscoveryNode [id=98c1fdf7-09db-4fa0-bb01-8ca7f046643d, addrs=[10.44.3.11, 127.0.0.1], sockAddrs=[/127.0.0.1:47500, XXX.com/10.44.3.11:47500], discPort=47500, order=3, intOrder=3, lastExchangeTime=1502269944782, loc=true, ver=2.1.1#20170618-sha1:09ce29e0, isClient=false], customEvt=null] > [10:12:24,813][INFO][exchange-worker-#72%null%][time] Finished exchange init [topVer=AffinityTopologyVersion [topVer=5, minorTopVer=0], crd=false] > [10:12:24,819][INFO][exchange-worker-#72%null%][GridCachePartitionExchangeManager] Skipping rebalancing (nothing scheduled) [top=AffinityTopologyVersion [topVer=5, minorTopVer=0], evt=NODE_FAILED, node=dd71479c-41ba-443e-b25c-3803a2a94f4f] > [10:12:28,344][INFO][grid-nio-worker-tcp-comm-0-#57%null%][TcpCommunicationSpi] Accepted incoming communication connection [locAddr=/10.44.3.11:47100, rmtAddr=/10.44.3.14:52474] > [10:12:28,348][INFO][grid-nio-worker-tcp-comm-1-#58%null%][TcpCommunicationSpi] Accepted incoming communication connection [locAddr=/10.44.3.11:47100, rmtAddr=/10.44.3.14:52482] > [10:12:28,356][INFO][grid-nio-worker-tcp-comm-0-#57%null%][TcpCommunicationSpi] Accepted incoming communication connection [locAddr=/10.44.3.11:47100, rmtAddr=/10.44.3.14:52506] > [10:12:28,362][INFO][grid-nio-worker-tcp-comm-1-#58%null%][TcpCommunicationSpi] Accepted incoming communication connection [locAddr=/10.44.3.11:47100, rmtAddr=/10.44.3.14:52522] > [10:12:28,368][INFO][grid-nio-worker-tcp-comm-0-#57%null%][TcpCommunicationSpi] Accepted incoming communication connection [locAddr=/10.44.3.11:47100, rmtAddr=/10.44.3.14:52538] > [10:12:28,374][INFO][grid-nio-worker-tcp-comm-1-#58%null%][TcpCommunicationSpi] Accepted incoming communication connection [locAddr=/10.44.3.11:47100, rmtAddr=/10.44.3.14:52554] > [10:12:28,380][INFO][grid-nio-worker-tcp-comm-0-#57%null%][TcpCommunicationSpi] Accepted incoming communication connection [locAddr=/10.44.3.11:47100, rmtAddr=/10.44.3.14:52570] > [10:12:28,386][INFO][grid-nio-worker-tcp-comm-1-#58%null%][TcpCommunicationSpi] Accepted incoming communication connection [locAddr=/10.44.3.11:47100, rmtAddr=/10.44.3.14:52586] > [10:12:28,392][INFO][grid-nio-worker-tcp-comm-0-#57%null%][TcpCommunicationSpi] Accepted incoming communication connection [locAddr=/10.44.3.11:47100, rmtAddr=/10.44.3.14:52602] > [10:12:28,397][INFO][grid-nio-worker-tcp-comm-1-#58%null%][TcpCommunicationSpi] Accepted incoming communication connection [locAddr=/10.44.3.11:47100, rmtAddr=/10.44.3.14:52618] > [10:12:28,402][INFO][grid-nio-worker-tcp-comm-0-#57%null%][TcpCommunicationSpi] Accepted incoming communication connection [locAddr=/10.44.3.11:47100, rmtAddr=/10.44.3.14:52634] > [10:12:28,407][INFO][grid-nio-worker-tcp-comm-1-#58%null%][TcpCommunicationSpi] Accepted incoming communication connection [locAddr=/10.44.3.11:47100, rmtAddr=/10.44.3.14:52650] > [10:12:28,412][INFO][grid-nio-worker-tcp-comm-0-#57%null%][TcpCommunicationSpi] Accepted incoming communication connection [locAddr=/10.44.3.11:47100, rmtAddr=/10.44.3.14:52666] > ... > [10:18:32,684][INFO][grid-nio-worker-tcp-comm-0-#57%null%][TcpCommunicationSpi] Accepted incoming communication connection [locAddr=/10.44.3.11:47100, rmtAddr=/10.44.3.14:43604] > [10:18:32,690][INFO][grid-nio-worker-tcp-comm-1-#58%null%][TcpCommunicationSpi] Accepted incoming communication connection [locAddr=/10.44.3.11:47100, rmtAddr=/10.44.3.14:43620] > [10:18:32,695][INFO][grid-nio-worker-tcp-comm-0-#57%null%][TcpCommunicationSpi] Accepted incoming communication connection [locAddr=/10.44.3.11:47100, rmtAddr=/10.44.3.14:43636] > [10:18:42,831][INFO][disco-event-worker-#71%null%][GridDiscoveryManager] Added new node to topology: TcpDiscoveryNode [id=2e80b0f0-21db-451d-a264-34ba16e00ffa, addrs=[10.44.3.14, 127.0.0.1], sockAddrs=[/127.0.0.1:0, gbrdsr000002837.intranet.barcapint.com/10.44.3.14:0], discPort=0, order=6, intOrder=5, lastExchangeTime=1502270322805, loc=false, ver=2.1.1#20170618-sha1:09ce29e0, isClient=true] > [10:18:42,832][INFO][disco-event-worker-#71%null%][GridDiscoveryManager] Topology snapshot [ver=6, servers=2, clients=2, CPUs=144, heap=90.0GB] > [10:18:42,833][INFO][exchange-worker-#72%null%][time] Started exchange init [topVer=AffinityTopologyVersion [topVer=6, minorTopVer=0], crd=false, evt=10, node=TcpDiscoveryNode [id=98c1fdf7-09db-4fa0-bb01-8ca7f046643d, addrs=[10.44.3.11, 127.0.0.1], sockAddrs=[/127.0.0.1:47500, XXX.com/10.44.3.11:47500], discPort=47500, order=3, intOrder=3, lastExchangeTime=1502270322815, loc=true, ver=2.1.1#20170618-sha1:09ce29e0, isClient=false], evtNode=TcpDiscoveryNode [id=98c1fdf7-09db-4fa0-bb01-8ca7f046643d, addrs=[10.44.3.11, 127.0.0.1], sockAddrs=[/127.0.0.1:47500, XXX.com/10.44.3.11:47500], discPort=47500, order=3, intOrder=3, lastExchangeTime=1502270322815, loc=true, ver=2.1.1#20170618-sha1:09ce29e0, isClient=false], customEvt=null] > [10:18:42,851][INFO][exchange-worker-#72%null%][time] Finished exchange init [topVer=AffinityTopologyVersion [topVer=6, minorTopVer=0], crd=false] > [10:18:42,855][INFO][exchange-worker-#72%null%][GridCachePartitionExchangeManager] Skipping rebalancing (nothing scheduled) [top=AffinityTopologyVersion [topVer=6, minorTopVer=0], evt=NODE_JOINED, node=2e80b0f0-21db-451d-a264-34ba16e00ffa] > {noformat} -- This message was sent by Atlassian JIRA (v6.4.14#64029)