ignite-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Alexey Goncharuk (JIRA)" <j...@apache.org>
Subject [jira] [Assigned] (IGNITE-11394) Infinite No next node in topology messages during node restart scenario
Date Mon, 25 Feb 2019 11:45:00 GMT

     [ https://issues.apache.org/jira/browse/IGNITE-11394?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel

Alexey Goncharuk reassigned IGNITE-11394:

    Assignee: Alexey Goncharuk

> Infinite No next node in topology messages during node restart scenario
> -----------------------------------------------------------------------
>                 Key: IGNITE-11394
>                 URL: https://issues.apache.org/jira/browse/IGNITE-11394
>             Project: Ignite
>          Issue Type: Improvement
>            Reporter: Alexey Goncharuk
>            Assignee: Alexey Goncharuk
>            Priority: Major
> I observe a situation with the following symptoms during a cycled nodes restart:
>  - A node being joining to the cluster sends join request, receives NodeAddedMessage
and awaits NodeAddFinishedMessage
>  - The node receives a metrics update message, the message is in the queue
>  - The whole cluster is being restarted, a new ring is formed
>  - The node re-sends the join request, it is successfully process by the ring
>  - The node added message is received by the joining node
>  - The node detects that it cannot send messages (failed nodes contains all ring remote
>  - Sine there was already a metrics update message in the queue, the node attempts to
re-add the message to the queue. Since the metrics update message is a high priority message,
it is added to the head of the queue and the node gets stuck in an infinite loop
> I suggest to drop metrics update message in {{sendMessageAcrossRing}} if we see the {{No
next node in topology}} situation.
> Another question is why don't we pass the collection of failed nodes to the {{ring.hasRemoteNodes()}}

This message was sent by Atlassian JIRA

View raw message