ignite-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Alexey Goncharuk (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (IGNITE-10898) Exchange coordinator failover breaks in some cases when node filter is used
Date Fri, 11 Jan 2019 14:27:00 GMT

     [ https://issues.apache.org/jira/browse/IGNITE-10898?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel

Alexey Goncharuk updated IGNITE-10898:
    Ignite Flags:   (was: Docs Required)

> Exchange coordinator failover breaks in some cases when node filter is used
> ---------------------------------------------------------------------------
>                 Key: IGNITE-10898
>                 URL: https://issues.apache.org/jira/browse/IGNITE-10898
>             Project: Ignite
>          Issue Type: Bug
>            Reporter: Alexey Goncharuk
>            Priority: Major
> Currently if a node does not pass cache node filter, we do not store this cache affinity
on the node unless the node is coordinator. This, however, may fail in the following scenario:
> 1) A node passing node filter joins cluster
> 2) During the join coordinator fails, new coordinator is selected for which previous
exchange is completed
> 3) Next coordinator attempts to fetch the affinity, and joining node resends partitions
single message, but there are two problems here. First, exchange fast-reply does not wait
for the new affinity initialization which results in {{IllegalStateException}}. Second, such
an attempt to fetch affinity may lead either to deadlock or to incorrectly fetched affinity
(basically, coordinator must be in consensus with other nodes passing node filter)
> Test attached reproduces the issue.
> I suggest to always calculate and keep affinity on all nodes, even ones not passing the
filter. In this case, there will be no need to fetch and recalculate affinity ({{initCoordinatorCaches}}
will go away.

This message was sent by Atlassian JIRA

View raw message