ignite-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ignite TC Bot (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (IGNITE-5115) Investigation of failing tests of coordinator node failure
Date Thu, 15 Nov 2018 10:58:00 GMT

    [ https://issues.apache.org/jira/browse/IGNITE-5115?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16687828#comment-16687828

Ignite TC Bot commented on IGNITE-5115:

{panel:title=No blockers found!|borderStyle=dashed|borderColor=#ccc|titleBGColor=#D6F7C1}{panel}
[TeamCity Run All Results|http://ci.ignite.apache.org/viewLog.html?buildId=2317539&amp;buildTypeId=IgniteTests24Java8_RunAll]

> Investigation of failing tests of coordinator node failure 
> -----------------------------------------------------------
>                 Key: IGNITE-5115
>                 URL: https://issues.apache.org/jira/browse/IGNITE-5115
>             Project: Ignite
>          Issue Type: Task
>          Components: messaging
>            Reporter: Sergey Chugunov
>            Assignee: Amelchev Nikita
>            Priority: Major
> Tests *customEventCoordinatorFailure1/2* from *TcpDiscoverySelfTest* are flaky on TC
and sometimes hang with the following assertion in logs:
> {code}
> Exception in thread "tcp-disco-msg-worker-#5245%tcp.TcpDiscoverySelfTest0%" java.lang.AssertionError
> 	at org.apache.ignite.spi.discovery.tcp.internal.TcpDiscoveryNodesRing.removeNode(TcpDiscoveryNodesRing.java:353)
> 	at org.apache.ignite.spi.discovery.tcp.ServerImpl$RingMessageWorker.processNodeFailedMessage(ServerImpl.java:4670)
> 	at org.apache.ignite.spi.discovery.tcp.ServerImpl$RingMessageWorker.processMessage(ServerImpl.java:2567)
> 	at org.apache.ignite.spi.discovery.tcp.ServerImpl$RingMessageWorker.processMessage(ServerImpl.java:2366)
> 	at org.apache.ignite.spi.discovery.tcp.ServerImpl$MessageWorkerAdapter.body(ServerImpl.java:6485)
> 	at org.apache.ignite.spi.discovery.tcp.ServerImpl$RingMessageWorker.body(ServerImpl.java:2456)
> 	at org.apache.ignite.spi.IgniteSpiThread.run(IgniteSpiThread.java:62)
> {code}
> It seems that this happens because tests' implementation drops connections of *TcpCommunicatonSpi*
on coordinator node with *simulateNodeFailure* method.
> At the same time tests leave *TcpDiscoverySpi* operational; it receives subsequent NodeFailed
message and throws the assertion error shown above.
> The whole situation looks legitimate as it is possible to imagine a situation when CommSPI
connections on coordinator fail for some reason while DiscoSPI connections are healthy.
> It is needed to investigate the situation deeper, figure out whether the root cause is
using of *simulateNodeFailure* or not and propose a solution if the error may happen in the
real life.

This message was sent by Atlassian JIRA

View raw message