ignite-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Pavel Kovalenko (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (IGNITE-9309) LocalNodeMovingPartitionsCount metrics may calculates incorrect due to processFullPartitionUpdate
Date Thu, 23 Aug 2018 15:00:00 GMT

    [ https://issues.apache.org/jira/browse/IGNITE-9309?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16590338#comment-16590338
] 

Pavel Kovalenko edited comment on IGNITE-9309 at 8/23/18 2:59 PM:
------------------------------------------------------------------

The actual problem was introduced in https://issues.apache.org/jira/browse/IGNITE-8684 .

The key issue that partition state changes now happens only after receiving FullMap with exchangeId
(PME). There can be race between handling FullMap with echangeId != null (PME) and FullMap
without exchangeId. If we receive fresh FullMap without exchangeId earlier than with, we override
our local partition states, and FullMap with exchangeId will be rejected as outdated. It means
that the partition states will never be changed and no rebalance will start.


was (Author: jokser):
The actual problem was introduced in https://issues.apache.org/jira/browse/IGNITE-8684 .

The key problem that partition state changes now happened only after receiving FullMap with
exchangeId (PME). There can be race between handling FullMap with echangeId != null (PME)
and FullMap without exchangeId. If we receive fresh FullMap without exchangeId earlier than
with, we override our local partition states, and FullMap with exchangeId will be rejected
as outdated. It means that the partition states will not be changed and no rebalance will
start.

> LocalNodeMovingPartitionsCount metrics may calculates incorrect due to processFullPartitionUpdate
> -------------------------------------------------------------------------------------------------
>
>                 Key: IGNITE-9309
>                 URL: https://issues.apache.org/jira/browse/IGNITE-9309
>             Project: Ignite
>          Issue Type: Bug
>    Affects Versions: 2.6
>            Reporter: Maxim Muzafarov
>            Priority: Major
>
> [~qvad] have found incorrect {{LocalNodeMovingPartitionsCount}} metrics calculation on
client node {{JOIN\LEFT}}. Full issue reproducer is absent.
> Probable scenario:
> {code}
> Repeat 10 times:
> 1. stop node
> 2. clean lfs
> 3. add stopped node (trigger rebalance)
> 4. 3 times: start 2 clients, wait for topology snapshot, close clients
> 5. for each cache group check JMX metrics LocalNodeMovingPartitionsCount (like waitForFinishRebalance())
> {code}
> Whole discussion and all configuration details can be found in comments of [IGNITE-7165|https://issues.apache.org/jira/browse/IGNITE-7165].



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Mime
View raw message