ignite-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Вячеслав Коптилин <slava.kopti...@gmail.com>
Subject Re: TcpDiscoverySpi worker thread failed with assertion error
Date Mon, 10 Jul 2017 10:21:25 GMT
Hi Juan,

U.currentTimeMillis(), which is based on System.currentTimeMillis(), is not
monotonic,
so there is no guarantee that returned values will always be greater than
all previous calls.

please see the following ticket:
https://issues.apache.org/jira/browse/IGNITE-5562

Thanks,
Slava.

2017-07-10 12:34 GMT+03:00 Juan Barani <Juan.Barani@unitedplanet.com>:

> Hi everyone,
>
>
>
> We are testing a new application running Ignite 2.0 with 3 nodes running
> on Azure platform.
>
>
>
> We left the cluster running a couple of days without load, and randomly
> one of the Nodes stopped with the following stacktrace:
>
>
>
> INFO  2017-07-07T23:07:28,475 -
>
> org.apache.ignite.internal.IgniteKernal%IntrexxDataGrid[
> grid-timeout-worker-#15%IntrexxDataGrid%]
>
>   FreeList [name=IntrexxDataGrid, buckets=256, dataPages=3, reusePages=0]
> INFO  2017-07-07T23:07:28,475 - org.apache.ignite.internal.
> IgniteKernal%IntrexxDataGrid[grid-timeout-worker-#15%IntrexxDataGrid%]
>
>   FreeList [name=IntrexxDataGrid, buckets=256, dataPages=9, reusePages=0]
> ERROR 2017-07-07T23:08:10,632 - org.apache.ignite.spi.
> discovery.tcp.TcpDiscoverySpi[tcp-disco-msg-worker-#2%IntrexxDataGrid%]
>
>
>
>
>
>   TcpDiscoverSpi's message worker thread failed abnormally. Stopping the
> node in order to prevent cluster wide instability.
>
> java.lang.AssertionError: -21
>
>     at
>
> org.apache.ignite.spi.discovery.tcp.internal.TcpDiscoveryStatistics.
> onMessageSent(TcpDiscoveryStatistics.java:317)
>
> ~[ignite-core-2.0.0.jar:2.0.0]
>
>      at
>
> org.apache.ignite.spi.discovery.tcp.ServerImpl$RingMessageWorker.
> sendMessageAcrossRing(ServerImpl.java:2980)
>
> ~[ignite-core-2.0.0.jar:2.0.0]
>
>      at
>
> org.apache.ignite.spi.discovery.tcp.ServerImpl$RingMessageWorker.
> processMetricsUpdateMessage(ServerImpl.java:5001)
>
> ~[ignite-core-2.0.0.jar:2.0.0]
>
>      at
>
> org.apache.ignite.spi.discovery.tcp.ServerImpl$RingMessageWorker.
> processMessage(ServerImpl.java:2562)
>
> ~[ignite-core-2.0.0.jar:2.0.0]
>
>      at
>
> org.apache.ignite.spi.discovery.tcp.ServerImpl$RingMessageWorker.
> processMessage(ServerImpl.java:2364)
>
> ~[ignite-core-2.0.0.jar:2.0.0]
>
>      at
>
> org.apache.ignite.spi.discovery.tcp.ServerImpl$MessageWorkerAdapter.body(
> ServerImpl.java:6460)
>
> ~[ignite-core-2.0.0.jar:2.0.0]
>
>      at
>
> org.apache.ignite.spi.discovery.tcp.ServerImpl$RingMessageWorker.body(
> ServerImpl.java:2450)
>
> [ignite-core-2.0.0.jar:2.0.0]
>
>      at
>
> org.apache.ignite.spi.IgniteSpiThread.run(IgniteSpiThread.java:62)
>
> [ignite-core-2.0.0.jar:2.0.0]
>
> ERROR 2017-07-07T23:08:10,661 -
>
> org.apache.ignite.spi.discovery.tcp.TcpDiscoverySpi[
> tcp-disco-msg-worker-#2%IntrexxDataGrid%]
>
>   Runtime error caught during grid runnable execution: IgniteSpiThread
> [name=tcp-disco-msg-worker-#2%IntrexxDataGrid%]
>
> java.lang.AssertionError: -21
>
>      at
>
> org.apache.ignite.spi.discovery.tcp.internal.TcpDiscoveryStatistics.
> onMessageSent(TcpDiscoveryStatistics.java:317)
>
> ~[ignite-core-2.0.0.jar:2.0.0]
>
>      at
>
> org.apache.ignite.spi.discovery.tcp.ServerImpl$RingMessageWorker.
> sendMessageAcrossRing(ServerImpl.java:2980)
>
> ~[ignite-core-2.0.0.jar:2.0.0]
>
>      at
>
> org.apache.ignite.spi.discovery.tcp.ServerImpl$RingMessageWorker.
> processMetricsUpdateMessage(ServerImpl.java:5001)
>
> ~[ignite-core-2.0.0.jar:2.0.0]
>
>      at
>
> org.apache.ignite.spi.discovery.tcp.ServerImpl$RingMessageWorker.
> processMessage(ServerImpl.java:2562)
>
> ~[ignite-core-2.0.0.jar:2.0.0]
>
>      at
>
> org.apache.ignite.spi.discovery.tcp.ServerImpl$RingMessageWorker.
> processMessage(ServerImpl.java:2364)
>
> ~[ignite-core-2.0.0.jar:2.0.0]
>
>      at
>
> org.apache.ignite.spi.discovery.tcp.ServerImpl$MessageWorkerAdapter.body(
> ServerImpl.java:6460)
>
> ~[ignite-core-2.0.0.jar:2.0.0]
>
>      at
>
> org.apache.ignite.spi.discovery.tcp.ServerImpl$RingMessageWorker.body(
> ServerImpl.java:2450)
>
> ~[ignite-core-2.0.0.jar:2.0.0]
>
>      at
>
> org.apache.ignite.spi.IgniteSpiThread.run(IgniteSpiThread.java:62)
>
>
>
> I researched online, and it seems that the thread fails when there is a
> long gc pause (>10 s), or not so much memory. In all those cases it comes
> also an InterruptedException.
>
>
>
> However, that does not seem to be the case, since we do have enough
> memory, and no load whatsoever. What strikes me the most is the assertion
> error:
>
>
>
>   TcpDiscoverSpi's message worker thread failed abnormally. Stopping the
> node in order to prevent cluster wide instability.
>
> java.lang.AssertionError: -21
>
>
>
> This is an assertion in method onMessageSent:
>
>
>
> assert time >= 0 : time;
>
>
>
> which in turns comes from (ServerImpl.java:2980):
>
>
>
> spi.stats.onMessageSent(msg, tstamp0 - tstamp);
>
>
>
> tstamp0 and tsamp are in turn values returned from U.currentTimeMillis();
>
>
>
> which if I understand correctly should return an ever increasing number
> (in 10 seconds interval).
>
>
>
> Hence I cannot figure out why this asserted value is negative.
>
>
>
> Can someone give us a hint what we should do?
>
>
>
> Thanks!
>
>
>
> Juan
>
>
>
>
>
>
>
>
>

Mime
View raw message