From Prasad Bhalerao <prasadbhalerao1...@gmail.com>
Subject How to debug network issues in cluster
Date Sun, 06 Jan 2019 14:03:47 GMT

I am consistently getting "Node is out of topology" message in logs on
node-1 and in other node, node-2 getting message "Timed out waiting for
message delivery receipt (most probably, the reason is in long GC pauses on
remote node; consider tuning GC and increasing '"

I have checked the network bandwidth using iperf and it is 470 Mbit per
sec. I have also checked the gc logs and max pause time is 140 ms.

If it is really happening because of network issues, it there any way to
debug it?

If it is happening because of gc, I would have seen it in gc logs.

Can someone please help me out with this?

Log messages on node-1:
2019-01-06 13:48:19,036 125016 [tcp-disco-srvr-#3%springDataNode%] INFO
o.a.i.s.d.tcp.TcpDiscoverySpi - TCP discovery accepted incoming connection
[rmtAddr=/, rmtPort=35651]
2019-01-06 13:48:19,037 125017 [tcp-disco-srvr-#3%springDataNode%] INFO
o.a.i.s.d.tcp.TcpDiscoverySpi - TCP discovery spawning a new thread for
connection [rmtAddr=/, rmtPort=35651]
2019-01-06 13:48:19,037 125017 [tcp-disco-sock-reader-#5%springDataNode%]
INFO  o.a.i.s.d.tcp.TcpDiscoverySpi - Started serving remote node
connection [rmtAddr=/, rmtPort=35651]
*2019-01-06 13:48:19,040 125020 [tcp-disco-msg-worker-#2%springDataNode%]
WARN  o.a.i.s.d.tcp.TcpDiscoverySpi - Node is out of topology (probably,
due to short-time network problems).*
2019-01-06 13:48:19,041 125021 [disco-event-worker-#62%springDataNode%]
WARN  o.a.i.i.m.d.GridDiscoveryManager - Local node SEGMENTED:
TcpDiscoveryNode [id=a5827f51-096a-4c98-af4f-564d2d3e769d,
addrs=[,], sockAddrs=[/,
qagmscore02.p13.eng.in03.qualys.com/], discPort=47500,
order=2, intOrder=2, lastExchangeTime=1546782499034, loc=true,
ver=2.7.0#20181130-sha1:256ae401, isClient=false]
2019-01-06 13:48:19,041 125021 [tcp-disco-sock-reader-#5%springDataNode%]
INFO  o.a.i.s.d.tcp.TcpDiscoverySpi - Finished serving remote node
connection [rmtAddr=/, rmtPort=35651
2019-01-06 13:48:19,866 125846 [tcp-comm-worker-#1%springDataNode%] INFO
o.a.i.s.d.tcp.TcpDiscoverySpi - Pinging node:

