ignite-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Amelchev Nikita (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (IGNITE-5115) Investigation of failing tests of coordinator node failure
Date Tue, 18 Dec 2018 14:56:00 GMT

    [ https://issues.apache.org/jira/browse/IGNITE-5115?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16724140#comment-16724140

Amelchev Nikita commented on IGNITE-5115:

[~akalashnikov] , thanks for taking a look at changes! 
I'll add a one more node to test and I'll notify on done.

> Investigation of failing tests of coordinator node failure 
> -----------------------------------------------------------
>                 Key: IGNITE-5115
>                 URL: https://issues.apache.org/jira/browse/IGNITE-5115
>             Project: Ignite
>          Issue Type: Bug
>          Components: messaging
>            Reporter: Sergey Chugunov
>            Assignee: Amelchev Nikita
>            Priority: Major
>              Labels: MakeTeamcityGreenAgain
>             Fix For: 2.8
> Tests *customEventCoordinatorFailure1/2* from *TcpDiscoverySelfTest* are flaky on TC
and sometimes hang with the following assertion in logs:
> {code}
> Exception in thread "tcp-disco-msg-worker-#5245%tcp.TcpDiscoverySelfTest0%" java.lang.AssertionError
> 	at org.apache.ignite.spi.discovery.tcp.internal.TcpDiscoveryNodesRing.removeNode(TcpDiscoveryNodesRing.java:353)
> 	at org.apache.ignite.spi.discovery.tcp.ServerImpl$RingMessageWorker.processNodeFailedMessage(ServerImpl.java:4670)
> 	at org.apache.ignite.spi.discovery.tcp.ServerImpl$RingMessageWorker.processMessage(ServerImpl.java:2567)
> 	at org.apache.ignite.spi.discovery.tcp.ServerImpl$RingMessageWorker.processMessage(ServerImpl.java:2366)
> 	at org.apache.ignite.spi.discovery.tcp.ServerImpl$MessageWorkerAdapter.body(ServerImpl.java:6485)
> 	at org.apache.ignite.spi.discovery.tcp.ServerImpl$RingMessageWorker.body(ServerImpl.java:2456)
> 	at org.apache.ignite.spi.IgniteSpiThread.run(IgniteSpiThread.java:62)
> {code}
> It seems that this happens because tests' implementation drops connections of *TcpCommunicatonSpi*
on coordinator node with *simulateNodeFailure* method.
> At the same time tests leave *TcpDiscoverySpi* operational; it receives subsequent NodeFailed
message and throws the assertion error shown above.
> The whole situation looks legitimate as it is possible to imagine a situation when CommSPI
connections on coordinator fail for some reason while DiscoSPI connections are healthy.
> It is needed to investigate the situation deeper, figure out whether the root cause is
using of *simulateNodeFailure* or not and propose a solution if the error may happen in the
real life.

This message was sent by Atlassian JIRA

View raw message