ignite-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From yucigou <yuci....@gmail.com>
Subject Re: Local node seems to be disconnected from topology (failure detection timeout is reached)
Date Thu, 11 Aug 2016 11:54:18 GMT
Happened again, the Ignite node on ves-hx-40 went down. The other node
ves-hx-41 has been running fine, and the network should not be of any issue.

(1) Ignite logs on ves-hx-40: 

[01:03:11,571][INFO ][grid-timeout-worker-#33%null%][IgniteKernal] 
Metrics for local node (to disable set 'metricsLogFrequency' to 0)
    ^-- Node [id=7ba70b8b, name=null, uptime=08:29:07:892]
    ^-- H/N/C [hosts=2, nodes=2, CPUs=4]
    ^-- CPU [cur=12.83%, avg=12.96%, GC=0.4%]
    ^-- Heap [used=1040MB, free=83.06%, comm=2047MB]
    ^-- Non heap [used=133MB, free=-1%, comm=139MB]
    ^-- Public thread pool [active=0, idle=16, qSize=0]
    ^-- System thread pool [active=0, idle=16, qSize=0]
    ^-- Outbound messages queue [size=0]
[01:04:11,587][INFO ][grid-timeout-worker-#33%null%][IgniteKernal] 
Metrics for local node (to disable set 'metricsLogFrequency' to 0)
    ^-- Node [id=7ba70b8b, name=null, uptime=08:30:07:908]
    ^-- H/N/C [hosts=2, nodes=2, CPUs=4]
    ^-- CPU [cur=14.5%, avg=12.96%, GC=0.4%]
    ^-- Heap [used=1036MB, free=83.14%, comm=2047MB]
    ^-- Non heap [used=133MB, free=-1%, comm=139MB]
    ^-- Public thread pool [active=0, idle=16, qSize=0]
    ^-- System thread pool [active=0, idle=16, qSize=0]
    ^-- Outbound messages queue [size=0]
[01:05:00,007][INFO ][tcp-disco-msg-worker-#2%null%][TcpDiscoverySpi] Local
node seems to be disconnected from topology (failure detection timeout is
reached) [failureDetectionTimeout=10000, connCheckFreq=3333]
[01:05:00,031][WARN ][grid-nio-worker-0-#36%null%][TcpCommunicationSpi]
Failed to process selector key (will close): GridSelectorNioSessionImpl
[selectorIdx=0, queueSize=0, writeBuf=java.nio.DirectByteBuffer[pos=0
lim=32768 cap=32768], readBuf=java.nio.DirectByteBuffer[pos=0 lim=32768
cap=32768], recovery=GridNioRecoveryDescriptor [acked=2266, resendCnt=0,
rcvCnt=1764, sentCnt=2267, reserved=true, lastAck=1764, nodeLeft=false,
node=TcpDiscoveryNode [id=3bc24639-7d87-44da-a585-59093dd08955,
addrs=[10.3.0.65, 127.0.0.1],
sockAddrs=[ves-hx-41.ebi.ac.uk/10.3.0.65:47500, /10.3.0.65:47500,
/127.0.0.1:47500], discPort=47500, order=12, intOrder=7,
lastExchangeTime=1470843764981, loc=false, ver=1.6.0#20160518-sha1:0b22c45b,
isClient=false], connected=true, connectCnt=1, queueLimit=5120],
super=GridNioSessionImpl [locAddr=/10.3.0.64:54485,
rmtAddr=ves-hx-41.ebi.ac.uk/10.3.0.65:47100, createTime=1470843765245,
closeTime=0, bytesSent=33838638, bytesRcvd=31435858,
sndSchedTime=1470873900018, lastSndTime=1470873900018,
lastRcvTime=1470873854689, readsPaused=false,
filterChain=FilterChain[filters=[GridNioCodecFilter
[parser=o.a.i.i.util.nio.GridDirectParser@25c570a4, directMode=true],
GridConnectionBytesVerifyFilter], accepted=false]]
[01:05:00,040][WARN ][grid-nio-worker-0-#36%null%][TcpCommunicationSpi]
Closing NIO session because of unhandled exception [cls=class
o.a.i.i.util.nio.GridNioException, msg=Connection reset by peer]
[01:05:00,054][WARN ][tcp-disco-msg-worker-#2%null%][TcpDiscoverySpi] Node
is out of topology (probably, due to short-time network problems).
[01:05:00,058][WARN ][disco-event-worker-#44%null%][GridDiscoveryManager]
Local node SEGMENTED: TcpDiscoveryNode
[id=7ba70b8b-44a8-4203-a229-cd2e3f286747, addrs=[10.3.0.64, 127.0.0.1],
sockAddrs=[ves-hx-40.ebi.ac.uk/10.3.0.64:47500, /10.3.0.64:47500,
/127.0.0.1:47500], discPort=47500, order=8, intOrder=5,
lastExchangeTime=1470873900048, loc=true, ver=1.6.0#20160518-sha1:0b22c45b,
isClient=false]
[01:05:00,118][WARN ][disco-event-worker-#44%null%][GridDiscoveryManager]
Stopping local node according to configured segmentation policy.
[01:05:00,121][WARN ][disco-event-worker-#44%null%][GridDiscoveryManager]
Node FAILED: TcpDiscoveryNode [id=3bc24639-7d87-44da-a585-59093dd08955,
addrs=[10.3.0.65, 127.0.0.1],
sockAddrs=[ves-hx-41.ebi.ac.uk/10.3.0.65:47500, /10.3.0.65:47500,
/127.0.0.1:47500], discPort=47500, order=12, intOrder=7,
lastExchangeTime=1470843764981, loc=false, ver=1.6.0#20160518-sha1:0b22c45b,
isClient=false]
[01:05:00,124][INFO ][disco-event-worker-#44%null%][GridDiscoveryManager]
Topology snapshot [ver=13, servers=1, clients=0, CPUs=2, heap=6.0GB]
[01:05:00,153][INFO ][Thread-33][GridTcpRestProtocol] Command protocol
successfully stopped: TCP binary
[01:05:00,196][INFO ][Thread-33][GridJettyRestProtocol] Command protocol
successfully stopped: Jetty REST
[01:05:00,985][INFO ][Thread-33][GridCacheProcessor] Stopped cache:
session-cache
[01:05:00,986][INFO ][Thread-33][GridCacheProcessor] Stopped cache:
ignite-marshaller-sys-cache
[01:05:00,986][INFO ][Thread-33][GridCacheProcessor] Stopped cache:
ignite-sys-cache
[01:05:00,987][INFO ][Thread-33][GridCacheProcessor] Stopped cache:
ignite-atomics-sys-cache
[01:05:01,039][INFO ][Thread-33][IgniteKernal] 

>>> +---------------------------------------------------------------------------------+
>>> Ignite ver. 1.6.0#20160518-sha1:0b22c45bb9b97692208fd0705ddf8045ff34a031
>>> stopped OK
>>> +---------------------------------------------------------------------------------+
>>> Grid uptime: 08:30:57:361

(2) Ignite logs on ves-hx-41:

[00:00:49,968][INFO ][grid-timeout-worker-#33%null%][IgniteKernal] 
Metrics for local node (to disable set 'metricsLogFrequency' to 0)
    ^-- Node [id=3bc24639, name=null, uptime=07:18:03:296]
    ^-- H/N/C [hosts=2, nodes=2, CPUs=4]
    ^-- CPU [cur=1.17%, avg=1.37%, GC=0%]
    ^-- Heap [used=681MB, free=88.91%, comm=2047MB]
    ^-- Non heap [used=131MB, free=-1%, comm=134MB]
    ^-- Public thread pool [active=0, idle=16, qSize=0]
    ^-- System thread pool [active=0, idle=16, qSize=0]
    ^-- Outbound messages queue [size=0]
[00:01:50,072][INFO ][grid-timeout-worker-#33%null%][IgniteKernal] 
Metrics for local node (to disable set 'metricsLogFrequency' to 0)
    ^-- Node [id=3bc24639, name=null, uptime=07:19:03:403]
    ^-- H/N/C [hosts=2, nodes=2, CPUs=4]
    ^-- CPU [cur=1.17%, avg=1.37%, GC=0%]
    ^-- Heap [used=617MB, free=89.94%, comm=2047MB]
    ^-- Non heap [used=131MB, free=-1%, comm=134MB]
    ^-- Public thread pool [active=0, idle=16, qSize=0]
    ^-- System thread pool [active=0, idle=16, qSize=0]
    ^-- Outbound messages queue [size=0]
		
(3) GC logs on ves-hx-40

2016-08-11T01:01:55.964+0100: 30484.918: [GC (Allocation Failure)
2016-08-11T01:01:55.964+0100: 30484.918: [ParNew: 130944K->0K(131008K),
0.0222134 secs] 1099957K->970725K(2097088K), 0.0224306 secs] [Times:
user=0.04 sys=0.00, real=0.02 secs] 
2016-08-11T01:02:01.562+0100: 30490.516: [GC (Allocation Failure)
2016-08-11T01:02:01.562+0100: 30490.516: [ParNew: 130944K->0K(131008K),
0.0189181 secs] 1101669K->971698K(2097088K), 0.0191399 secs] [Times:
user=0.04 sys=0.00, real=0.02 secs] 
2016-08-11T01:02:07.108+0100: 30496.062: [GC (Allocation Failure)
2016-08-11T01:02:07.108+0100: 30496.062: [ParNew: 130944K->0K(131008K),
0.0179867 secs] 1102642K->971963K(2097088K), 0.0182138 secs] [Times:
user=0.03 sys=0.00, real=0.02 secs] 
2016-08-11T01:02:12.583+0100: 30501.536: [GC (Allocation Failure)
2016-08-11T01:02:12.583+0100: 30501.537: [ParNew: 130944K->0K(131008K),
0.0218757 secs] 1102907K->972258K(2097088K), 0.0221288 secs] [Times:
user=0.05 sys=0.00, real=0.02 secs] 
2016-08-11T01:02:18.095+0100: 30507.048: [GC (Allocation Failure)
2016-08-11T01:02:18.095+0100: 30507.048: [ParNew: 130944K->0K(131008K),
0.0206840 secs] 1103202K->972384K(2097088K), 0.0209311 secs] [Times:
user=0.04 sys=0.00, real=0.02 secs] 
2016-08-11T01:02:23.669+0100: 30512.623: [GC (Allocation Failure)
2016-08-11T01:02:23.669+0100: 30512.623: [ParNew: 130944K->0K(131008K),
0.0192695 secs] 1103328K->973391K(2097088K), 0.0194975 secs] [Times:
user=0.04 sys=0.01, real=0.02 secs] 
2016-08-11T01:02:29.260+0100: 30518.214: [GC (Allocation Failure)
2016-08-11T01:02:29.260+0100: 30518.214: [ParNew: 130944K->0K(131008K),
0.0206649 secs] 1104335K->974242K(2097088K), 0.0209092 secs] [Times:
user=0.04 sys=0.00, real=0.02 secs] 
2016-08-11T01:02:34.821+0100: 30523.775: [GC (Allocation Failure)
2016-08-11T01:02:34.821+0100: 30523.775: [ParNew: 130944K->0K(131008K),
0.0207085 secs] 1105186K->974537K(2097088K), 0.0209536 secs] [Times:
user=0.04 sys=0.00, real=0.02 secs] 
2016-08-11T01:02:40.361+0100: 30529.315: [GC (Allocation Failure)
2016-08-11T01:02:40.361+0100: 30529.315: [ParNew: 130944K->0K(131008K),
0.0203313 secs] 1105481K->975204K(2097088K), 0.0205796 secs] [Times:
user=0.04 sys=0.00, real=0.02 secs] 
2016-08-11T01:02:45.865+0100: 30534.819: [GC (Allocation Failure)
2016-08-11T01:02:45.865+0100: 30534.819: [ParNew: 130944K->0K(131008K),
0.0186987 secs] 1106148K->976465K(2097088K), 0.0188950 secs] [Times:
user=0.04 sys=0.00, real=0.02 secs] 
2016-08-11T01:02:51.460+0100: 30540.414: [GC (Allocation Failure)
2016-08-11T01:02:51.460+0100: 30540.414: [ParNew: 130944K->0K(131008K),
0.0192286 secs] 1107409K->977249K(2097088K), 0.0194516 secs] [Times:
user=0.04 sys=0.00, real=0.02 secs] 
2016-08-11T01:02:56.971+0100: 30545.925: [GC (Allocation Failure)
2016-08-11T01:02:56.971+0100: 30545.925: [ParNew: 130944K->0K(131008K),
0.0240526 secs] 1108193K->978593K(2097088K), 0.0242959 secs] [Times:
user=0.05 sys=0.00, real=0.02 secs] 
2016-08-11T01:03:02.961+0100: 30551.915: [GC (Allocation Failure)
2016-08-11T01:03:02.961+0100: 30551.915: [ParNew: 130944K->0K(131008K),
0.0207524 secs] 1109537K->979186K(2097088K), 0.0210339 secs] [Times:
user=0.04 sys=0.00, real=0.02 secs] 
2016-08-11T01:03:07.975+0100: 30556.928: [GC (Allocation Failure)
2016-08-11T01:03:07.975+0100: 30556.928: [ParNew: 130944K->0K(131008K),
0.0230154 secs] 1110130K->980869K(2097088K), 0.0232501 secs] [Times:
user=0.05 sys=0.00, real=0.02 secs] 
2016-08-11T01:03:13.562+0100: 30562.515: [GC (Allocation Failure)
2016-08-11T01:03:13.562+0100: 30562.515: [ParNew: 130944K->0K(131008K),
0.0185308 secs] 1111813K->981706K(2097088K), 0.0187684 secs] [Times:
user=0.03 sys=0.00, real=0.02 secs] 
2016-08-11T01:03:18.972+0100: 30567.926: [GC (Allocation Failure)
2016-08-11T01:03:18.972+0100: 30567.926: [ParNew: 130944K->0K(131008K),
0.0184323 secs] 1112650K->982939K(2097088K), 0.0186324 secs] [Times:
user=0.03 sys=0.00, real=0.01 secs] 
2016-08-11T01:03:24.473+0100: 30573.426: [GC (Allocation Failure)
2016-08-11T01:03:24.473+0100: 30573.426: [ParNew: 130944K->0K(131008K),
0.0231739 secs] 1113883K->983951K(2097088K), 0.0234473 secs] [Times:
user=0.05 sys=0.00, real=0.02 secs] 
2016-08-11T01:03:29.986+0100: 30578.940: [GC (Allocation Failure)
2016-08-11T01:03:29.986+0100: 30578.940: [ParNew: 130944K->0K(131008K),
0.0173278 secs] 1114895K->984105K(2097088K), 0.0175596 secs] [Times:
user=0.04 sys=0.00, real=0.02 secs] 
2016-08-11T01:03:35.562+0100: 30584.516: [GC (Allocation Failure)
2016-08-11T01:03:35.562+0100: 30584.516: [ParNew: 130944K->0K(131008K),
0.0179701 secs] 1115049K->984790K(2097088K), 0.0182000 secs] [Times:
user=0.04 sys=0.00, real=0.02 secs] 
2016-08-11T01:03:41.063+0100: 30590.017: [GC (Allocation Failure)
2016-08-11T01:03:41.063+0100: 30590.017: [ParNew: 130944K->0K(131008K),
0.0197956 secs] 1115734K->985729K(2097088K), 0.0200094 secs] [Times:
user=0.04 sys=0.00, real=0.02 secs] 
2016-08-11T01:03:46.639+0100: 30595.592: [GC (Allocation Failure)
2016-08-11T01:03:46.639+0100: 30595.592: [ParNew: 130944K->0K(131008K),
0.0165813 secs] 1116673K->986217K(2097088K), 0.0167872 secs] [Times:
user=0.03 sys=0.00, real=0.01 secs] 
2016-08-11T01:03:52.068+0100: 30601.022: [GC (Allocation Failure)
2016-08-11T01:03:52.069+0100: 30601.022: [ParNew: 130944K->0K(131008K),
0.0187290 secs] 1117161K->987146K(2097088K), 0.0189981 secs] [Times:
user=0.03 sys=0.00, real=0.02 secs] 
2016-08-11T01:03:57.662+0100: 30606.616: [GC (Allocation Failure)
2016-08-11T01:03:57.662+0100: 30606.616: [ParNew: 130944K->0K(131008K),
0.0181821 secs] 1118090K->987894K(2097088K), 0.0184117 secs] [Times:
user=0.04 sys=0.00, real=0.02 secs] 
2016-08-11T01:04:03.164+0100: 30612.117: [GC (Allocation Failure)
2016-08-11T01:04:03.164+0100: 30612.118: [ParNew: 130944K->0K(131008K),
0.0220690 secs] 1118838K->988924K(2097088K), 0.0223413 secs] [Times:
user=0.04 sys=0.00, real=0.02 secs] 
2016-08-11T01:04:08.667+0100: 30617.620: [GC (Allocation Failure)
2016-08-11T01:04:08.667+0100: 30617.621: [ParNew: 130944K->0K(131008K),
0.0239423 secs] 1119868K->990345K(2097088K), 0.0241677 secs] [Times:
user=0.04 sys=0.00, real=0.03 secs] 
2016-08-11T01:04:14.172+0100: 30623.126: [GC (Allocation Failure)
2016-08-11T01:04:14.172+0100: 30623.126: [ParNew: 130944K->0K(131008K),
0.0252451 secs] 1121289K->991455K(2097088K), 0.0255631 secs] [Times:
user=0.05 sys=0.00, real=0.02 secs] 
2016-08-11T01:05:00.072+0100: 30669.025: [GC (Allocation Failure)
2016-08-11T01:05:00.072+0100: 30669.025: [ParNew: 130926K->0K(131008K),
0.0209069 secs] 1122381K->991927K(2097088K), 0.0212086 secs] [Times:
user=0.04 sys=0.00, real=0.02 secs] 
2016-08-11T01:23:07.573+0100: 31756.527: [GC (Allocation Failure)
2016-08-11T01:23:07.573+0100: 31756.527: [ParNew: 130944K->0K(131008K),
0.0808140 secs] 1122871K->999909K(2097088K), 0.0811343 secs] [Times:
user=0.16 sys=0.00, real=0.08 secs] 
2016-08-11T01:57:07.602+0100: 33796.556: [GC (Allocation Failure)
2016-08-11T01:57:07.602+0100: 33796.556: [ParNew: 130944K->0K(131008K),
0.0589586 secs] 1130853K->1006728K(2097088K), 0.0591637 secs] [Times:
user=0.11 sys=0.01, real=0.06 secs] 
2016-08-11T02:31:07.731+0100: 35836.685: [GC (Allocation Failure)
2016-08-11T02:31:07.731+0100: 35836.685: [ParNew: 130944K->0K(131008K),
0.0632614 secs] 1137672K->1013488K(2097088K), 0.0643351 secs] [Times:
user=0.12 sys=0.00, real=0.07 secs] 
2016-08-11T03:05:05.282+0100: 37874.235: [GC (Allocation Failure)
2016-08-11T03:05:05.282+0100: 37874.236: [ParNew: 130944K->0K(131008K),
0.0543631 secs] 1144432K->1020279K(2097088K), 0.0546948 secs] [Times:
user=0.11 sys=0.00, real=0.06 secs] 
2016-08-11T03:39:07.654+0100: 39916.608: [GC (Allocation Failure)
2016-08-11T03:39:07.655+0100: 39916.608: [ParNew: 130944K->0K(131008K),
0.0543531 secs] 1151223K->1027200K(2097088K), 0.0546226 secs] [Times:
user=0.11 sys=0.00, real=0.05 secs] 
2016-08-11T04:13:30.740+0100: 41979.693: [GC (Allocation Failure)
2016-08-11T04:13:30.740+0100: 41979.694: [ParNew: 130944K->0K(131008K),
0.0567574 secs] 1158144K->1034040K(2097088K), 0.0570319 secs] [Times:
user=0.11 sys=0.00, real=0.06 secs] 
2016-08-11T04:47:39.905+0100: 44028.859: [GC (Allocation Failure)
2016-08-11T04:47:39.905+0100: 44028.859: [ParNew: 130944K->0K(131008K),
0.0781639 secs] 1164984K->1040869K(2097088K), 0.0783822 secs] [Times:
user=0.15 sys=0.00, real=0.08 secs] 
2016-08-11T05:21:41.016+0100: 46069.970: [GC (Allocation Failure)
2016-08-11T05:21:41.016+0100: 46069.970: [ParNew: 130944K->0K(131008K),
0.0582058 secs] 1171813K->1047690K(2097088K), 0.0584024 secs] [Times:
user=0.11 sys=0.00, real=0.06 secs] 
2016-08-11T05:56:06.191+0100: 48135.144: [GC (Allocation Failure)
2016-08-11T05:56:06.191+0100: 48135.145: [ParNew: 130944K->0K(131008K),
0.0631365 secs] 1178634K->1054521K(2097088K), 0.0641970 secs] [Times:
user=0.12 sys=0.00, real=0.07 secs] 
2016-08-11T06:30:07.630+0100: 50176.583: [GC (Allocation Failure)
2016-08-11T06:30:07.630+0100: 50176.584: [ParNew: 130944K->0K(131008K),
0.0562791 secs] 1185465K->1061410K(2097088K), 0.0564979 secs] [Times:
user=0.11 sys=0.00, real=0.06 secs] 

(4) GC logs on ves-hx-41

2016-08-11T00:56:05.110+0100: 29607.742: [GC (Allocation Failure)
2016-08-11T00:56:05.110+0100: 29607.742: [ParNew: 130944K->0K(131008K),
0.0266464 secs] 717163K->587053K(2097088K), 0.0268836 secs] [Times:
user=0.05 sys=0.00, real=0.03 secs] 
2016-08-11T00:58:08.189+0100: 29730.821: [GC (Allocation Failure)
2016-08-11T00:58:08.189+0100: 29730.822: [ParNew: 130944K->0K(131008K),
0.0285321 secs] 717997K->588106K(2097088K), 0.0287730 secs] [Times:
user=0.05 sys=0.00, real=0.03 secs] 
2016-08-11T01:00:11.219+0100: 29853.851: [GC (Allocation Failure)
2016-08-11T01:00:11.219+0100: 29853.851: [ParNew: 130944K->0K(131008K),
0.0281378 secs] 719050K->588719K(2097088K), 0.0283973 secs] [Times:
user=0.05 sys=0.00, real=0.03 secs] 
2016-08-11T01:02:13.178+0100: 29975.810: [GC (Allocation Failure)
2016-08-11T01:02:13.178+0100: 29975.810: [ParNew: 130944K->0K(131008K),
0.0265786 secs] 719663K->589281K(2097088K), 0.0268661 secs] [Times:
user=0.05 sys=0.00, real=0.03 secs] 
2016-08-11T01:04:16.454+0100: 30099.086: [GC (Allocation Failure)
2016-08-11T01:04:16.454+0100: 30099.086: [ParNew: 130944K->0K(131008K),
0.0271111 secs] 720225K->589977K(2097088K), 0.0273313 secs] [Times:
user=0.05 sys=0.00, real=0.03 secs] 
2016-08-11T01:05:36.518+0100: 30179.150: [GC (Allocation Failure)
2016-08-11T01:05:36.518+0100: 30179.150: [ParNew: 130944K->0K(131008K),
0.0756634 secs] 720921K->594946K(2097088K), 0.0759647 secs] [Times:
user=0.14 sys=0.00, real=0.08 secs] 
2016-08-11T01:15:11.086+0100: 30753.718: [GC (Allocation Failure)
2016-08-11T01:15:11.086+0100: 30753.718: [ParNew: 130944K->0K(131008K),
0.0346118 secs] 725890K->597225K(2097088K), 0.0348076 secs] [Times:
user=0.07 sys=0.00, real=0.04 secs] 
2016-08-11T01:35:49.905+0100: 31992.537: [GC (Allocation Failure)
2016-08-11T01:35:49.905+0100: 31992.537: [ParNew: 130944K->0K(131008K),
0.0546067 secs] 728169K->601659K(2097088K), 0.0549078 secs] [Times:
user=0.10 sys=0.00, real=0.06 secs] 
2016-08-11T01:56:24.223+0100: 33226.855: [GC (Allocation Failure)
2016-08-11T01:56:24.223+0100: 33226.855: [ParNew: 130944K->0K(131008K),
0.0502786 secs] 732603K->606321K(2097088K), 0.0505826 secs] [Times:
user=0.09 sys=0.00, real=0.05 secs] 
2016-08-11T02:16:59.495+0100: 34462.127: [GC (Allocation Failure)
2016-08-11T02:16:59.495+0100: 34462.127: [ParNew: 130944K->0K(131008K),
0.0472343 secs] 737265K->610859K(2097088K), 0.0475460 secs] [Times:
user=0.09 sys=0.00, real=0.05 secs] 
2016-08-11T02:37:43.455+0100: 35706.087: [GC (Allocation Failure)
2016-08-11T02:37:43.455+0100: 35706.088: [ParNew: 130944K->0K(131008K),
0.0489817 secs] 741803K->615325K(2097088K), 0.0494726 secs] [Times:
user=0.09 sys=0.00, real=0.05 secs] 
2016-08-11T02:58:04.997+0100: 36927.629: [GC (Allocation Failure)
2016-08-11T02:58:04.997+0100: 36927.629: [ParNew: 130944K->0K(131008K),
0.0402853 secs] 746269K->619851K(2097088K), 0.0404998 secs] [Times:
user=0.08 sys=0.00, real=0.04 secs] 
2016-08-11T03:18:49.133+0100: 38171.765: [GC (Allocation Failure)
2016-08-11T03:18:49.133+0100: 38171.765: [ParNew: 130944K->0K(131008K),
0.0534768 secs] 750795K->624281K(2097088K), 0.0538719 secs] [Times:
user=0.10 sys=0.00, real=0.06 secs] 
2016-08-11T03:39:20.298+0100: 39402.931: [GC (Allocation Failure)
2016-08-11T03:39:20.299+0100: 39402.931: [ParNew: 130944K->0K(131008K),
0.0504364 secs] 755225K->628853K(2097088K), 0.0509379 secs] [Times:
user=0.10 sys=0.00, real=0.05 secs] 
2016-08-11T03:59:50.008+0100: 40632.640: [GC (Allocation Failure)
2016-08-11T03:59:50.008+0100: 40632.640: [ParNew: 130944K->0K(131008K),
0.0448188 secs] 759797K->633370K(2097088K), 0.0451403 secs] [Times:
user=0.08 sys=0.00, real=0.05 secs] 
2016-08-11T04:20:31.408+0100: 41874.040: [GC (Allocation Failure)
2016-08-11T04:20:31.408+0100: 41874.041: [ParNew: 130944K->0K(131008K),
0.0496370 secs] 764314K->637833K(2097088K), 0.0500137 secs] [Times:
user=0.10 sys=0.01, real=0.05 secs] 
2016-08-11T04:41:13.371+0100: 43116.003: [GC (Allocation Failure)
2016-08-11T04:41:13.371+0100: 43116.003: [ParNew: 130944K->0K(131008K),
0.0493521 secs] 768777K->642405K(2097088K), 0.0496875 secs] [Times:
user=0.09 sys=0.00, real=0.05 secs] 
2016-08-11T05:01:49.946+0100: 44352.578: [GC (Allocation Failure)
2016-08-11T05:01:49.946+0100: 44352.579: [ParNew: 130944K->0K(131008K),
0.0473887 secs] 773349K->646845K(2097088K), 0.0476503 secs] [Times:
user=0.09 sys=0.01, real=0.04 secs] 
2016-08-11T05:22:20.050+0100: 45582.682: [GC (Allocation Failure)
2016-08-11T05:22:20.050+0100: 45582.683: [ParNew: 130944K->0K(131008K),
0.0476691 secs] 777789K->651393K(2097088K), 0.0480185 secs] [Times:
user=0.09 sys=0.00, real=0.05 secs] 
2016-08-11T05:42:50.038+0100: 46812.670: [GC (Allocation Failure)
2016-08-11T05:42:50.038+0100: 46812.670: [ParNew: 130944K->0K(131008K),
0.0503541 secs] 782337K->655936K(2097088K), 0.0505806 secs] [Times:
user=0.10 sys=0.00, real=0.05 secs] 
2016-08-11T06:03:33.827+0100: 48056.459: [GC (Allocation Failure)
2016-08-11T06:03:33.828+0100: 48056.460: [ParNew: 130944K->0K(131008K),
0.0544074 secs] 786880K->660387K(2097088K), 0.0548633 secs] [Times:
user=0.11 sys=0.00, real=0.06 secs] 
2016-08-11T06:24:07.159+0100: 49289.791: [GC (Allocation Failure)
2016-08-11T06:24:07.159+0100: 49289.791: [ParNew: 130944K->0K(131008K),
0.0504913 secs] 791331K->664945K(2097088K), 0.0508051 secs] [Times:
user=0.10 sys=0.00, real=0.05 secs] 
2016-08-11T06:44:49.963+0100: 50532.595: [GC (Allocation Failure)
2016-08-11T06:44:49.963+0100: 50532.595: [ParNew: 130944K->0K(131008K),
0.0538779 secs] 795889K->669395K(2097088K), 0.0543000 secs] [Times:
user=0.10 sys=0.00, real=0.06 secs] 
2016-08-11T07:05:30.285+0100: 51772.917: [GC (Allocation Failure)
2016-08-11T07:05:30.285+0100: 51772.917: [ParNew: 130944K->0K(131008K),
0.0509373 secs] 800339K->673953K(2097088K), 0.0512089 secs] [Times:
user=0.10 sys=0.00, real=0.05 secs] 


(5) JVM arguments for ves-hx-40 (essentially same as ves-hx-41)

-Djava.awt.headless=true 
-Dtomcat.hostname=ves-hx-40 
-Dhttp.port=8100 
-Dsecure.port=8101 
-Dshutdown.port=8105 
-DIGNITE_HOME=/nfs/public/rw/webadmin/tomcat/bases/3rd-party/apache-ignite-fabric-1.6.0-bin

-DIGNITE_JETTY_PORT=8574 
-Dtomcat.major.version=7 
-Dwebadmin.path=/nfs/public/rw/webadmin 
-Djava.util.logging.manager=org.apache.juli.ClassLoaderLogManager 
-Xms2g 
-Xmx6g 
-XX:MaxPermSize=128m 
-Doracle.net.tns_admin= 
-Dcom.sun.management.jmxremote 
-Dcom.sun.management.jmxremote.port=8107 
-Dcom.sun.management.jmxremote.authenticate=false 
-Dcom.sun.management.jmxremote.ssl=false 
-DIGNITE_QUIET=false 
-XX:+UseParNewGC 
-XX:+UseConcMarkSweepGC 
-XX:+UseTLAB 
-XX:NewSize=128m 
-XX:MaxNewSize=128m 
-XX:MaxTenuringThreshold=0 
-XX:SurvivorRatio=1024 
-XX:+UseCMSInitiatingOccupancyOnly 
-XX:CMSInitiatingOccupancyFraction=60 
-XX:+DisableExplicitGC 
-XX:+PrintGCTimeStamps 
-XX:+PrintGCDateStamps 
-verbose:gc 
-XX:+PrintGCDetails 
-XX:+UseGCLogFileRotation 
-XX:NumberOfGCLogFiles=10 
-XX:GCLogFileSize=100M 
-Xloggc:/nfs/public/rw/webadmin/tomcat/bases/data/ves-hx-40/gc/gc_events.log 
-Djava.endorsed.dirs=/nfs/public/rw/webadmin/tomcat/homes/apache-tomcat-7.0.55/endorsed 
-Dcatalina.base=/nfs/public/rw/webadmin/tomcat/bases 
-Dcatalina.home=/nfs/public/rw/webadmin/tomcat/homes/apache-tomcat-7.0.55 
-Djava.io.tmpdir=/nfs/public/rw/webadmin/tomcat/bases/temp 

(6) Application logs on ves-hx-40

Ever since the web application has not be usable. Whenever I try to open a
web page, I get the following error message in my application log file: 

2016-08-11 09:23:52,129 ERROR root/error 495 - Failed to update web session:
null
java.lang.IllegalStateException: Grid is in invalid state to perform this
operation. It either not started yet or has already being or have stopped
[gridName=null, state=STOPPED]
	at
org.apache.ignite.internal.GridKernalGatewayImpl.illegalState(GridKernalGatewayImpl.java:190)
	at
org.apache.ignite.internal.GridKernalGatewayImpl.readLock(GridKernalGatewayImpl.java:90)
	at org.apache.ignite.internal.IgniteKernal.guard(IgniteKernal.java:3107)
	at org.apache.ignite.internal.IgniteKernal.cache(IgniteKernal.java:2436)
	at
org.apache.ignite.cache.websession.WebSessionFilter.initCache(WebSessionFilter.java:333)
	at
org.apache.ignite.cache.websession.WebSessionFilter.handleCacheOperationException(WebSessionFilter.java:877)
	at
org.apache.ignite.cache.websession.WebSessionFilter.handleLoadSessionException(WebSessionFilter.java:597)
	at
org.apache.ignite.cache.websession.WebSessionFilter.doFilterV2(WebSessionFilter.java:523)
	at
org.apache.ignite.cache.websession.WebSessionFilter.doFilterDispatch(WebSessionFilter.java:407)
	at
org.apache.ignite.cache.websession.WebSessionFilter.doFilter(WebSessionFilter.java:383)
	at
org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:241)
	at
org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:208)
	at
org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:220)
	at
org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:122)
	at
org.apache.catalina.authenticator.AuthenticatorBase.invoke(AuthenticatorBase.java:501)
	at
org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:171)
	at
org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:103)
	at
org.apache.catalina.valves.AccessLogValve.invoke(AccessLogValve.java:950)
	at
org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:116)
	at
org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:408)
	at
org.apache.coyote.http11.AbstractHttp11Processor.process(AbstractHttp11Processor.java:1070)
	at
org.apache.coyote.AbstractProtocol$AbstractConnectionHandler.process(AbstractProtocol.java:611)
	at
org.apache.tomcat.util.net.JIoEndpoint$SocketProcessor.run(JIoEndpoint.java:314)
	at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
	at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
	at
org.apache.tomcat.util.threads.TaskThread$WrappingRunnable.run(TaskThread.java:61)
	at java.lang.Thread.run(Thread.java:745)
	
Thanks for your kind help! 

Yuci



--
View this message in context: http://apache-ignite-users.70518.x6.nabble.com/Local-node-seems-to-be-disconnected-from-topology-failure-detection-timeout-is-reached-tp6797p6967.html
Sent from the Apache Ignite Users mailing list archive at Nabble.com.

Mime
View raw message