ignite-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Alexey Goncharuk (Jira)" <j...@apache.org>
Subject [jira] [Commented] (IGNITE-9913) Prevent data updates blocking in case of backup BLT server node leave
Date Wed, 09 Oct 2019 11:38:00 GMT

    [ https://issues.apache.org/jira/browse/IGNITE-9913?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16947584#comment-16947584

Alexey Goncharuk commented on IGNITE-9913:

[~NSAmelchev], [~avinogradov], a few comments for the PR:
 * The {{localRecoveryNeeded}} does not seem right - you check the list of partitions from
affinity assignment cache. There may be a case when a node still owns a partition, but it
is not an assigned backup for this partition (this will happen right after late affinity assignment
change, when affinity cache is changed to an ideal assignment, but the node did not yet RENTed
a partition). In this case, the partition will not be reported in the list of partitions and
recovery will be skipped
 * Do I understand correctly that *all* new transactions will still wait for this optimized
PME to complete? If yes, what is the actual time boost that this change gives? Do you have
any benchmark numbers? If no, how do you order transactions on a new primary node with the
backup transactions on the same node that did not finish recovery yet?

> Prevent data updates blocking in case of backup BLT server node leave
> ---------------------------------------------------------------------
>                 Key: IGNITE-9913
>                 URL: https://issues.apache.org/jira/browse/IGNITE-9913
>             Project: Ignite
>          Issue Type: Improvement
>          Components: general
>            Reporter: Ivan Rakov
>            Assignee: Anton Vinogradov
>            Priority: Major
>             Fix For: 2.8
>         Attachments: 9913_yardstick.png, master_yardstick.png
>          Time Spent: 1h 10m
>  Remaining Estimate: 0h
> Ignite cluster performs distributed partition map exchange when any server node leaves
or joins the topology.
> Distributed PME blocks all updates and may take a long time. If all partitions are assigned
according to the baseline topology and server node leaves, there's no actual need to perform
distributed PME: every cluster node is able to recalculate new affinity assigments and partition
states locally. If we'll implement such lightweight PME and handle mapping and lock requests
on new topology version correctly, updates won't be stopped (except updates of partitions
that lost their primary copy).

This message was sent by Atlassian Jira

View raw message