ignite-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From yucigou <yuci....@gmail.com>
Subject Re: Local node seems to be disconnected from topology (failure detection timeout is reached)
Date Wed, 10 Aug 2016 13:46:18 GMT
Happened again.

(1) Ignite logs:

[08:14:01,990][INFO ][grid-timeout-worker-#33%null%][IgniteKernal] 
Metrics for local node (to disable set 'metricsLogFrequency' to 0)
    ^-- Node [id=7faa417f, name=null, uptime=14:52:13:438]
    ^-- H/N/C [hosts=2, nodes=2, CPUs=4]
    ^-- CPU [cur=16.5%, avg=17.81%, GC=1.6%]
    ^-- Heap [used=625MB, free=89.82%, comm=2047MB]
    ^-- Non heap [used=93MB, free=-1%, comm=96MB]
    ^-- Public thread pool [active=0, idle=16, qSize=0]
    ^-- System thread pool [active=0, idle=16, qSize=0]
    ^-- Outbound messages queue [size=0]
[08:15:02,008][INFO ][grid-timeout-worker-#33%null%][IgniteKernal] 
Metrics for local node (to disable set 'metricsLogFrequency' to 0)
    ^-- Node [id=7faa417f, name=null, uptime=14:53:13:457]
    ^-- H/N/C [hosts=2, nodes=2, CPUs=4]
    ^-- CPU [cur=18.67%, avg=17.81%, GC=1.63%]
    ^-- Heap [used=373MB, free=93.92%, comm=2047MB]
    ^-- Non heap [used=93MB, free=-1%, comm=96MB]
    ^-- Public thread pool [active=0, idle=16, qSize=0]
    ^-- System thread pool [active=0, idle=16, qSize=0]
    ^-- Outbound messages queue [size=0]
[08:16:00,861][INFO ][tcp-disco-msg-worker-#2%null%][TcpDiscoverySpi] Local
node seems to be disconnected from topology (failure detection timeout is
reached) [failureDetectionTimeout=10000, connCheckFreq=3333]
[08:16:00,900][WARN ][grid-nio-worker-0-#36%null%][TcpCommunicationSpi]
Failed to process selector key (will close): GridSelectorNioSessionImpl
[selectorIdx=0, queueSize=0, writeBuf=java.nio.DirectByteBuffer[pos=0
lim=32768 cap=32768], readBuf=java.nio.DirectByteBuffer[pos=0 lim=32768
cap=32768], recovery=GridNioRecoveryDescriptor [acked=17892, resendCnt=0,
rcvCnt=18348, sentCnt=17901, reserved=true, lastAck=18348, nodeLeft=false,
node=TcpDiscoveryNode [id=358fcad3-1e47-45df-8cd4-875bdcd7011a,
addrs=[10.3.0.65, 127.0.0.1],
sockAddrs=[ves-hx-41.ebi.ac.uk/10.3.0.65:47500, /10.3.0.65:47500,
/127.0.0.1:47500], discPort=47500, order=4, intOrder=3,
lastExchangeTime=1470759707287, loc=false, ver=1.6.0#20160518-sha1:0b22c45b,
isClient=false], connected=true, connectCnt=0, queueLimit=5120],
super=GridNioSessionImpl [locAddr=/10.3.0.64:47100,
rmtAddr=/10.3.0.65:36219, createTime=1470759707458, closeTime=0,
bytesSent=9992093, bytesRcvd=5477037971, sndSchedTime=1470813320058,
lastSndTime=1470813320341, lastRcvTime=1470813320068, readsPaused=false,
filterChain=FilterChain[filters=[GridNioCodecFilter
[parser=o.a.i.i.util.nio.GridDirectParser@276d0d91, directMode=true],
GridConnectionBytesVerifyFilter], accepted=true]]
[08:16:00,904][WARN ][grid-nio-worker-0-#36%null%][TcpCommunicationSpi]
Closing NIO session because of unhandled exception [cls=class
o.a.i.i.util.nio.GridNioException, msg=Connection reset by peer]
[08:16:00,919][WARN ][tcp-disco-msg-worker-#2%null%][TcpDiscoverySpi] Node
is out of topology (probably, due to short-time network problems).
[08:16:00,926][WARN ][disco-event-worker-#45%null%][GridDiscoveryManager]
Local node SEGMENTED: TcpDiscoveryNode
[id=7faa417f-9760-434d-bc66-6551611ebf42, addrs=[10.3.0.64, 127.0.0.1],
sockAddrs=[ves-hx-40.ebi.ac.uk/10.3.0.64:47500, /10.3.0.64:47500,
/127.0.0.1:47500], discPort=47500, order=6, intOrder=4,
lastExchangeTime=1470813360913, loc=true, ver=1.6.0#20160518-sha1:0b22c45b,
isClient=false]
[08:16:01,104][WARN ][disco-event-worker-#45%null%][GridDiscoveryManager]
Stopping local node according to configured segmentation policy.
[08:16:01,109][WARN ][disco-event-worker-#45%null%][GridDiscoveryManager]
Node FAILED: TcpDiscoveryNode [id=358fcad3-1e47-45df-8cd4-875bdcd7011a,
addrs=[10.3.0.65, 127.0.0.1],
sockAddrs=[ves-hx-41.ebi.ac.uk/10.3.0.65:47500, /10.3.0.65:47500,
/127.0.0.1:47500], discPort=47500, order=4, intOrder=3,
lastExchangeTime=1470759707287, loc=false, ver=1.6.0#20160518-sha1:0b22c45b,
isClient=false]
[08:16:01,113][INFO ][disco-event-worker-#45%null%][GridDiscoveryManager]
Topology snapshot [ver=7, servers=1, clients=0, CPUs=2, heap=6.0GB]
[08:16:01,150][INFO ][Thread-31][GridTcpRestProtocol] Command protocol
successfully stopped: TCP binary
[08:16:02,641][INFO ][Thread-31][GridJettyRestProtocol] Command protocol
successfully stopped: Jetty REST
[08:16:02,685][INFO ][Thread-31][GridCacheProcessor] Stopped cache:
session-cache
[08:16:02,687][INFO ][Thread-31][GridCacheProcessor] Stopped cache:
ignite-marshaller-sys-cache
[08:16:02,687][INFO ][Thread-31][GridCacheProcessor] Stopped cache:
ignite-sys-cache
[08:16:02,688][INFO ][Thread-31][GridCacheProcessor] Stopped cache:
ignite-atomics-sys-cache

(2) GC logs

2016-08-10T08:08:11.335+0100: 53191.745: [GC (Allocation Failure)
2016-08-10T08:08:11.335+0100: 53191.745: [ParNew: 130944K->0K(131008K),
0.0430220 secs] 507259K->395398K(2097088K), 0.0432639 secs] [Times:
user=0.08 sys=0.00, real=0.04 secs] 
2016-08-10T08:08:11.760+0100: 53192.169: [GC (Allocation Failure)
2016-08-10T08:08:11.760+0100: 53192.169: [ParNew: 130944K->0K(131008K),
0.0496240 secs] 526342K->408993K(2097088K), 0.0498060 secs] [Times:
user=0.10 sys=0.00, real=0.05 secs] 
2016-08-10T08:08:11.964+0100: 53192.374: [GC (Allocation Failure)
2016-08-10T08:08:11.964+0100: 53192.374: [ParNew: 130944K->0K(131008K),
0.0200889 secs] 539937K->411803K(2097088K), 0.0203260 secs] [Times:
user=0.04 sys=0.00, real=0.03 secs] 
2016-08-10T08:08:14.370+0100: 53194.780: [GC (Allocation Failure)
2016-08-10T08:08:14.370+0100: 53194.780: [ParNew: 130944K->0K(131008K),
0.0288092 secs] 542747K->431458K(2097088K), 0.0289763 secs] [Times:
user=0.06 sys=0.00, real=0.03 secs] 
2016-08-10T08:08:14.715+0100: 53195.125: [GC (Allocation Failure)
2016-08-10T08:08:14.715+0100: 53195.125: [ParNew: 130944K->0K(131008K),
0.0512209 secs] 562402K->445397K(2097088K), 0.0514020 secs] [Times:
user=0.10 sys=0.00, real=0.05 secs] 
2016-08-10T08:08:14.889+0100: 53195.299: [GC (Allocation Failure)
2016-08-10T08:08:14.889+0100: 53195.299: [ParNew: 130944K->0K(131008K),
0.0139681 secs] 576341K->447351K(2097088K), 0.0141538 secs] [Times:
user=0.02 sys=0.00, real=0.01 secs] 
2016-08-10T08:08:17.371+0100: 53197.781: [GC (Allocation Failure)
2016-08-10T08:08:17.371+0100: 53197.781: [ParNew: 130944K->0K(131008K),
0.0294319 secs] 578295K->467533K(2097088K), 0.0296133 secs] [Times:
user=0.06 sys=0.00, real=0.03 secs] 
2016-08-10T08:08:17.716+0100: 53198.126: [GC (Allocation Failure)
2016-08-10T08:08:17.717+0100: 53198.126: [ParNew: 130944K->0K(131008K),
0.0559223 secs] 598477K->481546K(2097088K), 0.0561051 secs] [Times:
user=0.11 sys=0.00, real=0.05 secs] 
2016-08-10T08:08:17.829+0100: 53198.238: [GC (Allocation Failure)
2016-08-10T08:08:17.829+0100: 53198.239: [ParNew: 130944K->0K(131008K),
0.0105030 secs] 612490K->483049K(2097088K), 0.0106662 secs] [Times:
user=0.02 sys=0.00, real=0.01 secs] 
2016-08-10T08:08:20.374+0100: 53200.783: [GC (Allocation Failure)
2016-08-10T08:08:20.374+0100: 53200.783: [ParNew: 130944K->0K(131008K),
0.0348846 secs] 613993K->503361K(2097088K), 0.0351087 secs] [Times:
user=0.06 sys=0.00, real=0.04 secs] 
2016-08-10T08:08:20.830+0100: 53201.239: [GC (Allocation Failure)
2016-08-10T08:08:20.830+0100: 53201.240: [ParNew: 130944K->0K(131008K),
0.0613934 secs] 634305K->517255K(2097088K), 0.0618376 secs] [Times:
user=0.11 sys=0.00, real=0.06 secs] 
2016-08-10T08:08:20.977+0100: 53201.386: [GC (Allocation Failure)
2016-08-10T08:08:20.977+0100: 53201.386: [ParNew: 130944K->0K(131008K),
0.0100182 secs] 648199K->518802K(2097088K), 0.0101398 secs] [Times:
user=0.02 sys=0.00, real=0.01 secs] 
2016-08-10T08:08:23.450+0100: 53203.859: [GC (Allocation Failure)
2016-08-10T08:08:23.450+0100: 53203.859: [ParNew: 130944K->0K(131008K),
0.0337522 secs] 649746K->539495K(2097088K), 0.0339619 secs] [Times:
user=0.06 sys=0.00, real=0.03 secs] 
2016-08-10T08:08:23.878+0100: 53204.288: [GC (Allocation Failure)
2016-08-10T08:08:23.878+0100: 53204.288: [ParNew: 130944K->0K(131008K),
0.0497076 secs] 670439K->552950K(2097088K), 0.0498960 secs] [Times:
user=0.09 sys=0.00, real=0.05 secs] 
2016-08-10T08:08:23.994+0100: 53204.403: [GC (Allocation Failure)
2016-08-10T08:08:23.994+0100: 53204.403: [ParNew: 130944K->0K(131008K),
0.0129457 secs] 683894K->557057K(2097088K), 0.0131245 secs] [Times:
user=0.03 sys=0.00, real=0.02 secs] 
2016-08-10T08:08:26.403+0100: 53206.813: [GC (Allocation Failure)
2016-08-10T08:08:26.403+0100: 53206.813: [ParNew: 130944K->0K(131008K),
0.0382209 secs] 688001K->578332K(2097088K), 0.0384684 secs] [Times:
user=0.07 sys=0.00, real=0.04 secs] 
2016-08-10T08:08:26.838+0100: 53207.248: [GC (Allocation Failure)
2016-08-10T08:08:26.838+0100: 53207.248: [ParNew: 130944K->0K(131008K),
0.0450175 secs] 709276K->591347K(2097088K), 0.0452092 secs] [Times:
user=0.09 sys=0.00, real=0.04 secs] 
2016-08-10T08:08:26.948+0100: 53207.358: [GC (Allocation Failure)
2016-08-10T08:08:26.948+0100: 53207.358: [ParNew: 130944K->0K(131008K),
0.0114625 secs] 722291K->595446K(2097088K), 0.0115929 secs] [Times:
user=0.02 sys=0.00, real=0.01 secs] 

Thanks for your kind help!
Yuci



--
View this message in context: http://apache-ignite-users.70518.x6.nabble.com/Local-node-seems-to-be-disconnected-from-topology-failure-detection-timeout-is-reached-tp6797p6924.html
Sent from the Apache Ignite Users mailing list archive at Nabble.com.

Mime
View raw message