ignite-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ilya Kasnacheev (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (IGNITE-6071) Client may detect necessity for reconnect for too long
Date Thu, 31 Aug 2017 16:11:00 GMT

    [ https://issues.apache.org/jira/browse/IGNITE-6071?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16149185#comment-16149185
] 

Ilya Kasnacheev commented on IGNITE-6071:
-----------------------------------------

I don't understand what were the circumstances that led to the problem, however I have reproduced
it as a wider issue.

I've wrote a letter to dev@ titled "Hopeless looping in TcpCommunicationSpi"

> Client may detect necessity for reconnect for too long
> ------------------------------------------------------
>
>                 Key: IGNITE-6071
>                 URL: https://issues.apache.org/jira/browse/IGNITE-6071
>             Project: Ignite
>          Issue Type: Bug
>    Affects Versions: 2.1
>            Reporter: Yakov Zhdanov
>            Assignee: Ilya Kasnacheev
>
> There was a GC pause on client that caused servers to drop client due to inability to
establish TCP communication connection. Then it took some time for client to detect that it
has been dropped. During that time client many times attempted to connect to server which
can be seen in the logs. After client detected its drop and reconnected servers fired node
added event and no log flood can be found any more.
> We need to find out why client was reconnecting via communication and did not detect
the drop for such a long time.
> I hope this can be reproduced in test:
> * start 2 servers
> * start client
> * suspend all client threads with Thread.suspend() - just filter threads of current JVM
by name and suspend ones belonging to the client.
> {noformat}
> [10:12:24,785][WARNING][disco-event-worker-#71%null%][GridDiscoveryManager] Node FAILED:
TcpDiscoveryNode [id=dd71479c-41ba-443e-b25c-3803a2a94f4f, addrs=[10.44.3.14, 127.0.0.1],
sockAddrs=[/127.0.0.1:0, XXX.com/10.44.3.14:0], discPort=0, order=2, intOrder=2, lastExchangeTime=1502269008673,
loc=false, ver=2.1.1#20170618-sha1:09ce29e0, isClient=true]
> [10:12:24,785][INFO][disco-event-worker-#71%null%][GridDiscoveryManager] Topology snapshot
[ver=5, servers=2, clients=1, CPUs=144, heap=76.0GB]
> [10:12:24,794][INFO][exchange-worker-#72%null%][time] Started exchange init [topVer=AffinityTopologyVersion
[topVer=5, minorTopVer=0], crd=false, evt=12, node=TcpDiscoveryNode [id=98c1fdf7-09db-4fa0-bb01-8ca7f046643d,
addrs=[10.44.3.11, 127.0.0.1], sockAddrs=[/127.0.0.1:47500, XXX.com/10.44.3.11:47500], discPort=47500,
order=3, intOrder=3, lastExchangeTime=1502269944782, loc=true, ver=2.1.1#20170618-sha1:09ce29e0,
isClient=false], evtNode=TcpDiscoveryNode [id=98c1fdf7-09db-4fa0-bb01-8ca7f046643d, addrs=[10.44.3.11,
127.0.0.1], sockAddrs=[/127.0.0.1:47500, XXX.com/10.44.3.11:47500], discPort=47500, order=3,
intOrder=3, lastExchangeTime=1502269944782, loc=true, ver=2.1.1#20170618-sha1:09ce29e0, isClient=false],
customEvt=null]
> [10:12:24,813][INFO][exchange-worker-#72%null%][time] Finished exchange init [topVer=AffinityTopologyVersion
[topVer=5, minorTopVer=0], crd=false]
> [10:12:24,819][INFO][exchange-worker-#72%null%][GridCachePartitionExchangeManager] Skipping
rebalancing (nothing scheduled) [top=AffinityTopologyVersion [topVer=5, minorTopVer=0], evt=NODE_FAILED,
node=dd71479c-41ba-443e-b25c-3803a2a94f4f]
> [10:12:28,344][INFO][grid-nio-worker-tcp-comm-0-#57%null%][TcpCommunicationSpi] Accepted
incoming communication connection [locAddr=/10.44.3.11:47100, rmtAddr=/10.44.3.14:52474]
> [10:12:28,348][INFO][grid-nio-worker-tcp-comm-1-#58%null%][TcpCommunicationSpi] Accepted
incoming communication connection [locAddr=/10.44.3.11:47100, rmtAddr=/10.44.3.14:52482]
> [10:12:28,356][INFO][grid-nio-worker-tcp-comm-0-#57%null%][TcpCommunicationSpi] Accepted
incoming communication connection [locAddr=/10.44.3.11:47100, rmtAddr=/10.44.3.14:52506]
> [10:12:28,362][INFO][grid-nio-worker-tcp-comm-1-#58%null%][TcpCommunicationSpi] Accepted
incoming communication connection [locAddr=/10.44.3.11:47100, rmtAddr=/10.44.3.14:52522]
> [10:12:28,368][INFO][grid-nio-worker-tcp-comm-0-#57%null%][TcpCommunicationSpi] Accepted
incoming communication connection [locAddr=/10.44.3.11:47100, rmtAddr=/10.44.3.14:52538]
> [10:12:28,374][INFO][grid-nio-worker-tcp-comm-1-#58%null%][TcpCommunicationSpi] Accepted
incoming communication connection [locAddr=/10.44.3.11:47100, rmtAddr=/10.44.3.14:52554]
> [10:12:28,380][INFO][grid-nio-worker-tcp-comm-0-#57%null%][TcpCommunicationSpi] Accepted
incoming communication connection [locAddr=/10.44.3.11:47100, rmtAddr=/10.44.3.14:52570]
> [10:12:28,386][INFO][grid-nio-worker-tcp-comm-1-#58%null%][TcpCommunicationSpi] Accepted
incoming communication connection [locAddr=/10.44.3.11:47100, rmtAddr=/10.44.3.14:52586]
> [10:12:28,392][INFO][grid-nio-worker-tcp-comm-0-#57%null%][TcpCommunicationSpi] Accepted
incoming communication connection [locAddr=/10.44.3.11:47100, rmtAddr=/10.44.3.14:52602]
> [10:12:28,397][INFO][grid-nio-worker-tcp-comm-1-#58%null%][TcpCommunicationSpi] Accepted
incoming communication connection [locAddr=/10.44.3.11:47100, rmtAddr=/10.44.3.14:52618]
> [10:12:28,402][INFO][grid-nio-worker-tcp-comm-0-#57%null%][TcpCommunicationSpi] Accepted
incoming communication connection [locAddr=/10.44.3.11:47100, rmtAddr=/10.44.3.14:52634]
> [10:12:28,407][INFO][grid-nio-worker-tcp-comm-1-#58%null%][TcpCommunicationSpi] Accepted
incoming communication connection [locAddr=/10.44.3.11:47100, rmtAddr=/10.44.3.14:52650]
> [10:12:28,412][INFO][grid-nio-worker-tcp-comm-0-#57%null%][TcpCommunicationSpi] Accepted
incoming communication connection [locAddr=/10.44.3.11:47100, rmtAddr=/10.44.3.14:52666]
> ...
> [10:18:32,684][INFO][grid-nio-worker-tcp-comm-0-#57%null%][TcpCommunicationSpi] Accepted
incoming communication connection [locAddr=/10.44.3.11:47100, rmtAddr=/10.44.3.14:43604]
> [10:18:32,690][INFO][grid-nio-worker-tcp-comm-1-#58%null%][TcpCommunicationSpi] Accepted
incoming communication connection [locAddr=/10.44.3.11:47100, rmtAddr=/10.44.3.14:43620]
> [10:18:32,695][INFO][grid-nio-worker-tcp-comm-0-#57%null%][TcpCommunicationSpi] Accepted
incoming communication connection [locAddr=/10.44.3.11:47100, rmtAddr=/10.44.3.14:43636]
> [10:18:42,831][INFO][disco-event-worker-#71%null%][GridDiscoveryManager] Added new node
to topology: TcpDiscoveryNode [id=2e80b0f0-21db-451d-a264-34ba16e00ffa, addrs=[10.44.3.14,
127.0.0.1], sockAddrs=[/127.0.0.1:0, gbrdsr000002837.intranet.barcapint.com/10.44.3.14:0],
discPort=0, order=6, intOrder=5, lastExchangeTime=1502270322805, loc=false, ver=2.1.1#20170618-sha1:09ce29e0,
isClient=true]
> [10:18:42,832][INFO][disco-event-worker-#71%null%][GridDiscoveryManager] Topology snapshot
[ver=6, servers=2, clients=2, CPUs=144, heap=90.0GB]
> [10:18:42,833][INFO][exchange-worker-#72%null%][time] Started exchange init [topVer=AffinityTopologyVersion
[topVer=6, minorTopVer=0], crd=false, evt=10, node=TcpDiscoveryNode [id=98c1fdf7-09db-4fa0-bb01-8ca7f046643d,
addrs=[10.44.3.11, 127.0.0.1], sockAddrs=[/127.0.0.1:47500, XXX.com/10.44.3.11:47500], discPort=47500,
order=3, intOrder=3, lastExchangeTime=1502270322815, loc=true, ver=2.1.1#20170618-sha1:09ce29e0,
isClient=false], evtNode=TcpDiscoveryNode [id=98c1fdf7-09db-4fa0-bb01-8ca7f046643d, addrs=[10.44.3.11,
127.0.0.1], sockAddrs=[/127.0.0.1:47500, XXX.com/10.44.3.11:47500], discPort=47500, order=3,
intOrder=3, lastExchangeTime=1502270322815, loc=true, ver=2.1.1#20170618-sha1:09ce29e0, isClient=false],
customEvt=null]
> [10:18:42,851][INFO][exchange-worker-#72%null%][time] Finished exchange init [topVer=AffinityTopologyVersion
[topVer=6, minorTopVer=0], crd=false]
> [10:18:42,855][INFO][exchange-worker-#72%null%][GridCachePartitionExchangeManager] Skipping
rebalancing (nothing scheduled) [top=AffinityTopologyVersion [topVer=6, minorTopVer=0], evt=NODE_JOINED,
node=2e80b0f0-21db-451d-a264-34ba16e00ffa]
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Mime
View raw message