ignite-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From yucigou <yuci....@gmail.com>
Subject Local node seems to be disconnected from topology (failure detection timeout is reached)
Date Fri, 05 Aug 2016 11:57:13 GMT
Hello,

One of my Ignite nodes was stopped and the logs were appended as below. It
seems that grid-timeout-worker checks the health of the cluster every
minute. But then in my case, before the due time 23:34:19, at 23:34:03 it
reported "Local node seems to be disconnected from topology (failure
detection timeout is reached)", and the Ignite node got stopped. In turn,
the web session clustering, and so on, stopped working.

Just wonder what could cause this to happen? There should be no network
issue etc with the host machine then. It is a bit scary to us, as it can
happen to our production servers in the near future.

Thank you for your help.

Yuci

===================Ignite logs======================
[23:31:19,896][INFO ][grid-timeout-worker-#33%null%][IgniteKernal] 
Metrics for local node (to disable set 'metricsLogFrequency' to 0)
    ^-- Node [id=9a069f70, name=null, uptime=10:37:03:793]
    ^-- H/N/C [hosts=2, nodes=2, CPUs=4]
    ^-- CPU [cur=43.17%, avg=12.83%, GC=1.1%]
    ^-- Heap [used=2115MB, free=61.26%, comm=3955MB]
    ^-- Non heap [used=138MB, free=-1%, comm=143MB]
    ^-- Public thread pool [active=0, idle=16, qSize=0]
    ^-- System thread pool [active=0, idle=16, qSize=0]
    ^-- Outbound messages queue [size=0]
[23:32:19,904][INFO ][grid-timeout-worker-#33%null%][IgniteKernal] 
Metrics for local node (to disable set 'metricsLogFrequency' to 0)
    ^-- Node [id=9a069f70, name=null, uptime=10:38:03:801]
    ^-- H/N/C [hosts=2, nodes=2, CPUs=4]
    ^-- CPU [cur=0.83%, avg=12.87%, GC=0%]
    ^-- Heap [used=2638MB, free=51.69%, comm=3957MB]
    ^-- Non heap [used=138MB, free=-1%, comm=143MB]
    ^-- Public thread pool [active=0, idle=16, qSize=0]
    ^-- System thread pool [active=0, idle=16, qSize=0]
    ^-- Outbound messages queue [size=0]
[23:33:19,913][INFO ][grid-timeout-worker-#33%null%][IgniteKernal] 
Metrics for local node (to disable set 'metricsLogFrequency' to 0)
    ^-- Node [id=9a069f70, name=null, uptime=10:39:03:808]
    ^-- H/N/C [hosts=2, nodes=2, CPUs=4]
    ^-- CPU [cur=0.5%, avg=12.86%, GC=0%]
    ^-- Heap [used=796MB, free=85.41%, comm=3921MB]
    ^-- Non heap [used=138MB, free=-1%, comm=143MB]
    ^-- Public thread pool [active=0, idle=16, qSize=0]
    ^-- System thread pool [active=0, idle=16, qSize=0]
    ^-- Outbound messages queue [size=0]
[23:34:03,752][INFO ][tcp-disco-msg-worker-#2%null%][TcpDiscoverySpi] Local
node seems to be disconnected from topology (failure detection timeout is
reached) [failureDetectionTimeout=10000, connCheckFreq=3333]
[23:34:03,783][WARN ][tcp-disco-msg-worker-#2%null%][TcpDiscoverySpi] Node
is out of topology (probably, due to short-time network problems).
[23:34:03,786][WARN ][disco-event-worker-#44%null%][GridDiscoveryManager]
Local node SEGMENTED: TcpDiscoveryNode
[id=9a069f70-d49d-472e-9771-7ac2353e751f, addrs=[10.3.0.64, 127.0.0.1],
sockAddrs=[ves-hx-40.ebi.ac.uk/10.3.0.64:47500, /10.3.0.64:47500,
/127.0.0.1:47500], discPort=47500, order=56, intOrder=29,
lastExchangeTime=1470350043783, loc=true, ver=1.6.0#20160518-sha1:0b22c45b,
isClient=false]
[23:34:03,819][WARN ][disco-event-worker-#44%null%][GridDiscoveryManager]
Stopping local node according to configured segmentation policy.
[23:34:03,825][WARN ][disco-event-worker-#44%null%][GridDiscoveryManager]
Node FAILED: TcpDiscoveryNode [id=cef7fc5e-b854-4072-8e16-396a87d5d556,
addrs=[10.3.0.65, 127.0.0.1],
sockAddrs=[ves-hx-41.ebi.ac.uk/10.3.0.65:47500, /10.3.0.65:47500,
/127.0.0.1:47500], discPort=47500, order=58, intOrder=30,
lastExchangeTime=1470311808664, loc=false, ver=1.6.0#20160518-sha1:0b22c45b,
isClient=false]
[23:34:03,827][INFO ][disco-event-worker-#44%null%][GridDiscoveryManager]
Topology snapshot [ver=59, servers=1, clients=0, CPUs=2, heap=5.3GB]
[23:34:03,874][INFO ][Thread-32][GridTcpRestProtocol] Command protocol
successfully stopped: TCP binary
[23:34:03,902][INFO ][Thread-32][GridJettyRestProtocol] Command protocol
successfully stopped: Jetty REST
[23:34:04,571][INFO ][Thread-32][GridCacheProcessor] Stopped cache:
session-cache
[23:34:04,572][INFO ][Thread-32][GridCacheProcessor] Stopped cache:
ignite-marshaller-sys-cache
[23:34:04,572][INFO ][Thread-32][GridCacheProcessor] Stopped cache:
ignite-sys-cache
[23:34:04,573][INFO ][Thread-32][GridCacheProcessor] Stopped cache:
ignite-atomics-sys-cache
[23:34:04,583][INFO ][Thread-32][GridCacheProcessor] Stopped cache:
wicket-data-store
[23:34:04,623][INFO ][Thread-32][IgniteKernal] 

>>> +---------------------------------------------------------------------------------+
>>> Ignite ver. 1.6.0#20160518-sha1:0b22c45bb9b97692208fd0705ddf8045ff34a031
>>> stopped OK
>>> +---------------------------------------------------------------------------------+
>>> Grid uptime: 10:39:48:518





--
View this message in context: http://apache-ignite-users.70518.x6.nabble.com/Local-node-seems-to-be-disconnected-from-topology-failure-detection-timeout-is-reached-tp6797.html
Sent from the Apache Ignite Users mailing list archive at Nabble.com.

Mime
View raw message