ignite-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ilya Kasnacheev (JIRA)" <j...@apache.org>
Subject [jira] [Created] (IGNITE-7476) Server node will join with failure gathering metrics
Date Fri, 19 Jan 2018 11:59:00 GMT
Ilya Kasnacheev created IGNITE-7476:

             Summary: Server node will join with failure gathering metrics
                 Key: IGNITE-7476
                 URL: https://issues.apache.org/jira/browse/IGNITE-7476
             Project: Ignite
          Issue Type: Bug
            Reporter: Ilya Kasnacheev

Sometimes server node will fail with the following trace:
SEVERE: TcpDiscoverSpi's message worker thread failed abnormally. Stopping the node in order
to prevent cluster wide instability.
    at org.apache.ignite.internal.managers.discovery.GridDiscoveryManager$7.cacheMetrics(GridDiscoveryManager.java:1149)
    at org.apache.ignite.spi.discovery.tcp.ServerImpl$RingMessageWorker.processMetricsUpdateMessage(ServerImpl.java:5022)
    at org.apache.ignite.spi.discovery.tcp.ServerImpl$RingMessageWorker.processMessage(ServerImpl.java:2690)
    at org.apache.ignite.spi.discovery.tcp.ServerImpl$RingMessageWorker.processMessage(ServerImpl.java:2491)
    at org.apache.ignite.spi.discovery.tcp.ServerImpl$MessageWorkerAdapter.body(ServerImpl.java:6675)
    at org.apache.ignite.spi.discovery.tcp.ServerImpl$RingMessageWorker.body(ServerImpl.java:2574)
    at org.apache.ignite.spi.IgniteSpiThread.run(IgniteSpiThread.java:62){code}
Two problems here:
 * Uncaught exception in cacheMetrics() leads to unconditional failure of node, because it
happens to be in discovery thread. Should probably wrap all non-trivial code include try-catch.
 * Lack of proper locking when destroying cache (see also IGNITE-6423 and IGNITE-7165)


This message was sent by Atlassian JIRA

View raw message