ignite-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Maxim Muzafarov (Jira)" <j...@apache.org>
Subject [jira] [Commented] (IGNITE-8828) Detecting and stopping unresponsive nodes during Partition Map Exchange
Date Tue, 08 Oct 2019 13:19:00 GMT

    [ https://issues.apache.org/jira/browse/IGNITE-8828?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16946854#comment-16946854

Maxim Muzafarov commented on IGNITE-8828:

Moved to 2.9 due to inactivity. Please, feel free to move it back if you will be able to complete
the ticket by 2.8 code freeze date, December 2, 2019.

> Detecting and stopping unresponsive nodes during Partition Map Exchange
> -----------------------------------------------------------------------
>                 Key: IGNITE-8828
>                 URL: https://issues.apache.org/jira/browse/IGNITE-8828
>             Project: Ignite
>          Issue Type: Improvement
>          Components: general
>            Reporter: Sergey Chugunov
>            Priority: Major
>              Labels: iep-25
>             Fix For: 2.8
>   Original Estimate: 264h
>  Remaining Estimate: 264h
> During PME process coordinator (1) gathers local partition maps from all nodes and (2)
sends calculated full partition map back to all nodes in the topology.
> However if one or more nodes fail to send local information on step 1 for any reason,
PME process hangs blocking all operations. The only solution will be to manually identify
and stop nodes which failed to send info to coordinator.
> This should be done by coordinator itself: in case it didn't receive in time local partition
maps from any nodes, it should check that stopping these nodes won't lead to data loss and
then stop them forcibly.

This message was sent by Atlassian Jira

View raw message