ignite-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ralph Goers <ralph.go...@dslextreme.com>
Subject Node fails to recover
Date Wed, 17 Jan 2018 15:11:44 GMT
We are running an application on 2 servers with each running Ignite in a cluster. As the logs
show below at some point the nodes had trouble communicating with each other. What I would
really like to know is why one of the nodes seemed to recover and the other node did not.
Is there something I should be looking for or some setting that might be misconfigured?

Thanks,
Ralph



192.168.202.110
 
2018-01-04 22:16:52 WARN  [ ] TcpDiscoverySpi:133 - Timed out waiting for message delivery
receipt (most probably, the reason is in long GC pauses on remote node; consider tuning GC
and increasing 'ackTimeout' configuration property). Will retry to send message with increased
timeout [currentTimeout=10000, rmtAddr=/192.168.202.111:47500, rmtPort=47500]
2018-01-04 22:16:52 WARN  [ ] TcpDiscoverySpi:133 - Failed to send message to next node [msg=TcpDiscoveryMetricsUpdateMessage
[super=TcpDiscoveryAbstractMessage [sndNodeId=null, id=50668565061-9d6b5111-c433-4e43-997b-3c803c84ee45,
verifierNodeId=9d6b5111-c433-4e43-997b-3c803c84ee45, topVer=0, pendingIdx=0, failedNodes=null,
isClient=false]], next=TcpDiscoveryNode [id=225ed6e9-3116-465f-bc3a-94818278fd31, addrs=[0:0:0:0:0:0:0:1%lo,
127.0.0.1, 192.168.202.111], sockAddrs=[/0:0:0:0:0:0:0:1%lo:47500, /127.0.0.1:47500, /192.168.202.111:47500],
discPort=47500, order=12, intOrder=7, lastExchangeTime=1513268425829, loc=false, ver=2.3.0#20171027-sha1:8add7fd5,
isClient=false], errMsg=Failed to send message to next node [msg=TcpDiscoveryMetricsUpdateMessage
[super=TcpDiscoveryAbstractMessage [sndNodeId=null, id=50668565061-9d6b5111-c433-4e43-997b-3c803c84ee45,
verifierNodeId=9d6b5111-c433-4e43-997b-3c803c84ee45, topVer=0, pendingIdx=0, failedNodes=null,
isClient=false]], next=ClusterNode [id=225ed6e9-3116-465f-bc3a-94818278fd31, order=12, addr=[0:0:0:0:0:0:0:1%lo,
127.0.0.1, 192.168.202.111], daemon=false]]]
2018-01-04 22:16:52 WARN  [ ] TcpDiscoverySpi:133 - Local node has detected failed nodes and
started cluster-wide procedure. To speed up failure detection please see 'Failure Detection'
section under javadoc for 'TcpDiscoverySpi'
2018-01-04 22:16:52 WARN  [ ] GridDiscoveryManager:133 - Node FAILED: TcpDiscoveryNode [id=225ed6e9-3116-465f-bc3a-94818278fd31,
addrs=[0:0:0:0:0:0:0:1%lo, 127.0.0.1, 192.168.202.111], sockAddrs=[/0:0:0:0:0:0:0:1%lo:47500,
/127.0.0.1:47500, /192.168.202.111:47500], discPort=47500, order=12, intOrder=7, lastExchangeTime=1513268425829,
loc=false, ver=2.3.0#20171027-sha1:8add7fd5, isClient=false]
2018-01-04 22:16:52 INFO  [ ] GridDiscoveryManager:128 - Topology snapshot [ver=13, servers=1,
clients=0, CPUs=4, heap=2.0GB]
2018-01-04 22:16:52 INFO  [ ] time:128 - Started exchange init [topVer=AffinityTopologyVersion
[topVer=13, minorTopVer=0], crd=true, evt=NODE_FAILED, evtNode=225ed6e9-3116-465f-bc3a-94818278fd31,
customEvt=null, allowMerge=true]
2018-01-04 22:16:52 INFO  [ ] GridDhtPartitionsExchangeFuture:128 - Finished waiting for partition
release future [topVer=AffinityTopologyVersion [topVer=13, minorTopVer=0], waitTime=0ms, futInfo=NA]
2018-01-04 22:16:52 INFO  [ ] GridDhtPartitionsExchangeFuture:128 - Coordinator received all
messages, try merge [ver=AffinityTopologyVersion [topVer=13, minorTopVer=0]]
2018-01-04 22:16:52 INFO  [ ] GridDhtPartitionsExchangeFuture:128 - finishExchangeOnCoordinator
[topVer=AffinityTopologyVersion [topVer=13, minorTopVer=0], resVer=AffinityTopologyVersion
[topVer=13, minorTopVer=0]]
2018-01-04 22:16:53 INFO  [ ] GridDhtPartitionsExchangeFuture:128 - Finish exchange future
[startVer=AffinityTopologyVersion [topVer=13, minorTopVer=0], resVer=AffinityTopologyVersion
[topVer=13, minorTopVer=0], err=null]
2018-01-04 22:16:53 INFO  [ ] time:128 - Finished exchange init [topVer=AffinityTopologyVersion
[topVer=13, minorTopVer=0], crd=true]
2018-01-04 22:16:53 INFO  [ ] GridCachePartitionExchangeManager:128 - Skipping rebalancing
(nothing scheduled) [top=AffinityTopologyVersion [topVer=13, minorTopVer=0], evt=NODE_FAILED,
node=225ed6e9-3116-465f-bc3a-94818278fd31]
2018-01-04 22:17:00 INFO  [ ] IgniteKernal:128 -
 
192.168.202.111
 
2018-01-04 22:16:12 INFO  [ ] IgniteKernal:128 - FreeList [name=null, buckets=256, dataPages=5,
reusePages=0]
2018-01-04 22:16:12 INFO  [ ] IgniteKernal:128 - FreeList [name=null, buckets=256, dataPages=5,
reusePages=0]
2018-01-04 22:16:52 INFO  [ ] TcpDiscoverySpi:128 - Finished serving remote node connection
[rmtAddr=/192.168.202.110:55327, rmtPort=55327
2018-01-04 22:16:52 INFO  [ ] TcpDiscoverySpi:128 - TCP discovery accepted incoming connection
[rmtAddr=/192.168.202.110, rmtPort=51136]
2018-01-04 22:16:52 INFO  [ ] TcpDiscoverySpi:128 - TCP discovery spawning a new thread for
connection [rmtAddr=/192.168.202.110, rmtPort=51136]
2018-01-04 22:16:52 INFO  [ ] TcpDiscoverySpi:128 - Started serving remote node connection
[rmtAddr=/192.168.202.110:51136, rmtPort=51136]
2018-01-04 22:16:52 WARN  [ ] TcpDiscoverySpi:133 - Node is out of topology (probably, due
to short-time network problems).
2018-01-04 22:16:52 INFO  [ ] TcpDiscoverySpi:128 - Finished serving remote node connection
[rmtAddr=/192.168.202.110:51136, rmtPort=51136
2018-01-04 22:16:52 WARN  [ ] GridDiscoveryManager:133 - Local node SEGMENTED: TcpDiscoveryNode
[id=225ed6e9-3116-465f-bc3a-94818278fd31, addrs=[0:0:0:0:0:0:0:1%lo, 127.0.0.1, 192.168.202.111],
sockAddrs=[/0:0:0:0:0:0:0:1%lo:47500, /127.0.0.1:47500, /192.168.202.111:47500], discPort=47500,
order=12, intOrder=7, lastExchangeTime=1515129412681, loc=true, ver=2.3.0#20171027-sha1:8add7fd5,
isClient=false]
2018-01-04 22:16:53 WARN  [ ] GridDiscoveryManager:133 - Stopping local node according to
configured segmentation policy.
2018-01-04 22:16:53 WARN  [ ] GridDiscoveryManager:133 - Node FAILED: TcpDiscoveryNode [id=9d6b5111-c433-4e43-997b-3c803c84ee45,
addrs=[0:0:0:0:0:0:0:1%lo, 127.0.0.1, 192.168.202.110], sockAddrs=[/0:0:0:0:0:0:0:1%lo:47500,
/127.0.0.1:47500, /192.168.202.110:47500], discPort=47500, order=10, intOrder=6, lastExchangeTime=1513268425880,
loc=false, ver=2.3.0#20171027-sha1:8add7fd5, isClient=false]
2018-01-04 22:16:53 INFO  [ ] GridDiscoveryManager:128 - Topology snapshot [ver=13, servers=1,
clients=0, CPUs=4, heap=2.0GB]
2018-01-04 22:16:53 INFO  [ ] time:128 - Started exchange init [topVer=AffinityTopologyVersion
[topVer=13, minorTopVer=0], crd=true, evt=NODE_FAILED, evtNode=9d6b5111-c433-4e43-997b-3c803c84ee45,
customEvt=null, allowMerge=true]
2018-01-04 22:16:53 INFO  [ ] GridTcpRestProtocol:128 - Command protocol successfully stopped:
TCP binary
2018-01-04 22:16:53 INFO  [ ] GridDhtPartitionsExchangeFuture:128 - Finished waiting for partition
release future [topVer=AffinityTopologyVersion [topVer=13, minorTopVer=0], waitTime=0ms, futInfo=NA]
2018-01-04 22:16:53 INFO  [ ] GridDhtPartitionsExchangeFuture:128 - Coordinator received all
messages, try merge [ver=AffinityTopologyVersion [topVer=13, minorTopVer=0]]
2018-01-04 22:16:53 INFO  [ ] GridDhtPartitionsExchangeFuture:128 - finishExchangeOnCoordinator
[topVer=AffinityTopologyVersion [topVer=13, minorTopVer=0], resVer=AffinityTopologyVersion
[topVer=13, minorTopVer=0]]
2018-01-04 22:16:53 INFO  [ ] GridDhtPartitionsExchangeFuture:128 - Finish exchange future
[startVer=AffinityTopologyVersion [topVer=13, minorTopVer=0], resVer=null, err=class org.apache.ignite.internal.IgniteInterruptedCheckedException:
Thread is interrupted: IgniteThread [compositeRwLockIdx=1, stripe=-1, plc=-1, name=exchange-worker-#42]]
2018-01-04 22:16:53 INFO  [ ] time:128 - Finished exchange init [topVer=AffinityTopologyVersion
[topVer=13, minorTopVer=0], crd=true]
2018-01-04 22:16:53 INFO  [ ] GridCacheProcessor:128 - Stopped cache [cacheName=loginCache]
2018-01-04 22:16:53 INFO  [ ] GridCacheProcessor:128 - Stopped cache [cacheName=sessionCache]
2018-01-04 22:16:53 INFO  [ ] GridCacheProcessor:128 - Stopped cache [cacheName=authCache]
2018-01-04 22:16:53 INFO  [ ] GridCacheProcessor:128 - Stopped cache [cacheName=ignite-sys-cache]
2018-01-04 22:16:53 INFO  [ ] GridCacheProcessor:128 - Stopped cache [cacheName=DirectoryContactCache]
2018-01-04 22:16:54 INFO  [ ] IgniteKernal:128 -
Mime
View raw message