ignite-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Alexey Kuznetsov (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (IGNITE-5968) Test fail in Ignite Cache 2: GridCachePartitionNotLoadedEventSelfTest.testPrimaryAndBackupDead
Date Mon, 28 May 2018 15:26:00 GMT

    [ https://issues.apache.org/jira/browse/IGNITE-5968?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16492783#comment-16492783
] 

Alexey Kuznetsov edited comment on IGNITE-5968 at 5/28/18 3:25 PM:
-------------------------------------------------------------------

[~DmitriyGovorukhin] [~agoncharuk] 
The bug due to "lost partition" event is only thrown on new primary node, not on new backup(after
old primary and backup nodes are down).

Partition loss policy is _IGNORE_.The test scenario is as follows,

{code:java}
startGrid(0);
startGrid(1);
startGrid(2);
startGrid(3);

ignite(2).events().localListen(lsnr1, EventType.EVT_CACHE_REBALANCE_PART_DATA_LOST);
ignite(3).events().localListen(lsnr2, EventType.EVT_CACHE_REBALANCE_PART_DATA_LOST);

cache.put(key1, key1);// node 0 is primary for key key1, node 1 is backup for key1.

stopGrid(0, true);
stopGrid(1, true);// after both grids are stopped, we have partition lost for key1.

// Node 2 is new primary node for key1, node 3 is new backup node for key1.

checkEventIsFired(lsn1, lsnr2); // EVT_CACHE_REBALANCE_PART_DATA_LOST event is only thrown
on new primary node.
{code}

When 2 nodes, holding partition for key1, have crashed, we have "lost partition" event, fired
only on new primary node(not on backup).

The essential reason for this bug is that new primary node *don't set* LOST state to the partitions,

instead it pretends that no partition loss has happened and clears the partition loss state
right away, see _GridDhtPartitionTopologyImpl#detectLostPartitions_
Primary node sends partitions map to backup node, backup node detects *no* lost partitions.
So, no events are fired on backup node.

One solution to this is to broadcast partition map with lost partitions via _GridDhtPartitionsFullMessage_.

Are you agree with this solution?


was (Author: alexey kuznetsov):
[~DmitriyGovorukhin] [~agoncharuk] 
The bug due to "lost partition" event is only thrown on new primary node, not on new backup(after
old primary and backup nodes are down).

The test scenario is as follows,

{code:java}
startGrid(0);
startGrid(1);
startGrid(2);
startGrid(3);

ignite(2).events().localListen(lsnr1, EventType.EVT_CACHE_REBALANCE_PART_DATA_LOST);
ignite(3).events().localListen(lsnr2, EventType.EVT_CACHE_REBALANCE_PART_DATA_LOST);

cache.put(key1, key1);// node 0 is primary for key key1, node 1 is backup for key1.

stopGrid(0, true);
stopGrid(1, true);// after both grids are stopped, we have partition lost for key1.

// Node 2 is new primary node for key1, node 3 is new backup node for key1.

checkEventIsFired(lsn1, lsnr2); // EVT_CACHE_REBALANCE_PART_DATA_LOST event is only thrown
on new primary node.
{code}

When 2 nodes, holding partition for key1, have crashed, we have "lost partition" event, fired
only on new primary node(not on backup).

The essential reason for this bug is that new primary node *don't set* LOST state to the partitions,

instead it pretends that no partition loss has happened and clears the partition loss state
right away, see _GridDhtPartitionTopologyImpl#detectLostPartitions_
Primary node sends partitions map to backup node, backup node detects *no* lost partitions.
So, no events are fired on backup node.

One solution to this is to broadcast partition map with lost partitions via _GridDhtPartitionsFullMessage_.

Are you agree with this solution?

> Test fail in Ignite Cache 2: GridCachePartitionNotLoadedEventSelfTest.testPrimaryAndBackupDead
> ----------------------------------------------------------------------------------------------
>
>                 Key: IGNITE-5968
>                 URL: https://issues.apache.org/jira/browse/IGNITE-5968
>             Project: Ignite
>          Issue Type: Test
>    Affects Versions: 2.1
>            Reporter: Dmitriy Govorukhin
>            Assignee: Alexey Kuznetsov
>            Priority: Major
>              Labels: MakeTeamcityGreenAgain
>             Fix For: 2.6
>
>
> java.util.concurrent.TimeoutException: Test has been timed out [test=testPrimaryAndBackupDead,
timeout=300000]
>     at org.apache.ignite.testframework.junits.GridAbstractTest.runTest(GridAbstractTest.java:1949)
>     at junit.framework.TestCase.runBare(TestCase.java:141)
>     at junit.framework.TestResult$1.protect(TestResult.java:122)
>     at junit.framework.TestResult.runProtected(TestResult.java:142)
>     at junit.framework.TestResult.run(TestResult.java:125)
>     at junit.framework.TestCase.run(TestCase.java:129)
>     at junit.framework.TestSuite.runTest(TestSuite.java:255)
>     at junit.framework.TestSuite.run(TestSuite.java:250)
>     at junit.framework.TestSuite.runTest(TestSuite.java:255)
>     at junit.framework.TestSuite.run(TestSuite.java:250)
>     at org.junit.internal.runners.JUnit38ClassRunner.run(JUnit38ClassRunner.java:84)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Mime
View raw message