ignite-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ignite TC Bot (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (IGNITE-9562) Destroyed cache that resurrected on an old offline node breaks PME
Date Fri, 16 Aug 2019 08:46:00 GMT

    [ https://issues.apache.org/jira/browse/IGNITE-9562?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16908852#comment-16908852
] 

Ignite TC Bot commented on IGNITE-9562:
---------------------------------------

{panel:title=Branch: [pull/6781/head] Base: [ignite-2.7.6] : Possible Blockers (41)|borderStyle=dashed|borderColor=#ccc|titleBGColor=#F7D6C1}
{color:#d04437}Platform C++ (Linux Clang){color} [[tests 0 Exit Code , Failure on metric |https://ci.ignite.apache.org/viewLog.html?buildId=4503638]]

{color:#d04437}Platform .NET (Inspections)*{color} [[tests 0 Failure on metric |https://ci.ignite.apache.org/viewLog.html?buildId=4503679]]

{color:#d04437}Platform C++ (Linux)*{color} [[tests 0 Exit Code , Failure on metric |https://ci.ignite.apache.org/viewLog.html?buildId=4503632]]

{color:#d04437}MVCC Cache{color} [[tests 1|https://ci.ignite.apache.org/viewLog.html?buildId=4503639]]
* IgniteCacheMvccTestSuite: CacheMvccPartitionedCoordinatorFailoverTest.testGetReadInsideTxInProgressCoordinatorFails_ReadDelay
- Test has low fail rate in base branch 0,0% and is not flaky

{color:#d04437}Cache (Restarts) 2{color} [[tests 1|https://ci.ignite.apache.org/viewLog.html?buildId=4503654]]
* IgniteCacheRestartTestSuite2: IgniteCachePutAllRestartTest.testStopNode - Test has low fail
rate in base branch 0,0% and is not flaky

{color:#d04437}PDS 1{color} [[tests 2|https://ci.ignite.apache.org/viewLog.html?buildId=4503673]]
* IgnitePdsTestSuite: IgnitePdsDestroyCacheTest.testDestroyCachesAbruptly - Test has low fail
rate in base branch 0,0% and is not flaky
* IgnitePdsTestSuite: IgnitePdsDestroyCacheTest.testDestroyGroupCachesAbruptly - Test has
low fail rate in base branch 0,0% and is not flaky

{color:#d04437}Cache 7{color} [[tests 1|https://ci.ignite.apache.org/viewLog.html?buildId=4503662]]
* IgniteCacheTestSuite7: CacheMetricsManageTest.testJmxPdsStatisticsEnable - Test has low
fail rate in base branch 0,0% and is not flaky

{color:#d04437}Cache 6{color} [[tests 1|https://ci.ignite.apache.org/viewLog.html?buildId=4503661]]
* IgniteCacheTestSuite6: TxRollbackOnTimeoutNearCacheTest.testSimple - Test has low fail rate
in base branch 0,0% and is not flaky

{color:#d04437}Platform C++ (Win x64 / Release){color} [[tests 0 BuildFailureOnMessage |https://ci.ignite.apache.org/viewLog.html?buildId=4503633]]

{color:#d04437}ZooKeeper (Discovery) 1{color} [[tests 30|https://ci.ignite.apache.org/viewLog.html?buildId=4503630]]
* ZookeeperDiscoverySpiTestSuite1: ZookeeperDiscoverySpiTest.testSegmentation2 - Test has
low fail rate in base branch 0,0% and is not flaky
* ZookeeperDiscoverySpiTestSuite1: ZookeeperDiscoverySpiTest.testSegmentation3 - Test has
low fail rate in base branch 0,0% and is not flaky
* ZookeeperDiscoverySpiTestSuite1: ZookeeperDiscoverySpiTest.testConcurrentStartStop2_EventsThrottle
- Test has low fail rate in base branch 0,0% and is not flaky
* ZookeeperDiscoverySpiTestSuite1: ZookeeperDiscoverySpiTest.testCustomEventsSimple1_5_Nodes
- Test has low fail rate in base branch 0,0% and is not flaky
* ZookeeperDiscoverySpiTestSuite1: ZookeeperDiscoverySpiTest.testMultipleClusters - Test has
low fail rate in base branch 0,0% and is not flaky
* ZookeeperDiscoverySpiTestSuite1: ZookeeperDiscoverySpiTest.testConnectionRestore_Coordinator1_1
- Test has low fail rate in base branch 0,0% and is not flaky
* ZookeeperDiscoverySpiTestSuite1: ZookeeperDiscoverySpiTest.testWithPersistence1 - Test has
low fail rate in base branch 0,0% and is not flaky
* ZookeeperDiscoverySpiTestSuite1: ZookeeperDiscoverySpiTest.testWithPersistence2 - Test has
low fail rate in base branch 0,0% and is not flaky
* ZookeeperDiscoverySpiTestSuite1: ZookeeperDiscoverySpiTest.testStartStop1 - Test has low
fail rate in base branch 0,0% and is not flaky
* ZookeeperDiscoverySpiTestSuite1: ZookeeperDiscoverySpiTest.testStartStop2 - Test has low
fail rate in base branch 0,0% and is not flaky
* ZookeeperDiscoverySpiTestSuite1: ZookeeperDiscoverySpiTest.testDeployService2 - Test has
low fail rate in base branch 0,0% and is not flaky
... and 19 tests blockers

{color:#d04437}Cache (Failover) 1{color} [[tests 1|https://ci.ignite.apache.org/viewLog.html?buildId=4503648]]
* IgniteCacheFailoverTestSuite: IgniteAtomicLongChangingTopologySelfTest.testClientQueueCreateCloseFailover
- Test has low fail rate in base branch 0,0% and is not flaky

{panel}
[TeamCity *--&gt; Run :: All* Results|https://ci.ignite.apache.org/viewLog.html?buildId=4503707&amp;buildTypeId=IgniteTests24Java8_RunAll]

> Destroyed cache that resurrected on an old offline node breaks PME
> ------------------------------------------------------------------
>
>                 Key: IGNITE-9562
>                 URL: https://issues.apache.org/jira/browse/IGNITE-9562
>             Project: Ignite
>          Issue Type: Bug
>          Components: cache
>    Affects Versions: 2.5
>            Reporter: Pavel Kovalenko
>            Assignee: Eduard Shangareev
>            Priority: Critical
>             Fix For: 2.8, 2.7.6
>
>          Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> Given:
> 2 nodes, persistence enabled.
> 1) Stop 1 node
> 2) Destroy cache through client
> 3) Start stopped node
> When the stopped node joins to cluster it starts all caches that it has seen before stopping.
> If that cache was cluster-widely destroyed it leads to breaking the crash recovery process
or PME.
> Root cause - we don't start/collect caches from the stopped node on another part of a
cluster.
> In case of PARTITIONED cache mode that scenario breaks crash recovery:
> {noformat}
> java.lang.AssertionError: AffinityTopologyVersion [topVer=-1, minorTopVer=0]
> 	at org.apache.ignite.internal.processors.affinity.GridAffinityAssignmentCache.cachedAffinity(GridAffinityAssignmentCache.java:696)
> 	at org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtPartitionTopologyImpl.updateLocal(GridDhtPartitionTopologyImpl.java:2449)
> 	at org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtPartitionTopologyImpl.afterStateRestored(GridDhtPartitionTopologyImpl.java:679)
> 	at org.apache.ignite.internal.processors.cache.persistence.GridCacheDatabaseSharedManager.restorePartitionStates(GridCacheDatabaseSharedManager.java:2445)
> 	at org.apache.ignite.internal.processors.cache.persistence.GridCacheDatabaseSharedManager.applyLastUpdates(GridCacheDatabaseSharedManager.java:2321)
> 	at org.apache.ignite.internal.processors.cache.persistence.GridCacheDatabaseSharedManager.restoreState(GridCacheDatabaseSharedManager.java:1568)
> 	at org.apache.ignite.internal.processors.cache.persistence.GridCacheDatabaseSharedManager.beforeExchange(GridCacheDatabaseSharedManager.java:1308)
> 	at org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture.distributedExchange(GridDhtPartitionsExchangeFuture.java:1255)
> 	at org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture.init(GridDhtPartitionsExchangeFuture.java:766)
> 	at org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager$ExchangeWorker.body0(GridCachePartitionExchangeManager.java:2577)
> 	at org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager$ExchangeWorker.body(GridCachePartitionExchangeManager.java:2457)
> 	at org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:110)
> 	at java.lang.Thread.run(Thread.java:748)
> {noformat}
> In case of REPLICATED cache mode that scenario breaks PME coordinator process:
> {noformat}
> [2018-09-12 18:50:36,407][ERROR][sys-#148%distributed.CacheStopAndRessurectOnOldNodeTest0%][GridCacheIoManager]
Failed to process message [senderId=4b6fd0d4-b756-4a9f-90ca-f0ee25100001, messageType=class
o.a.i.i.processors.cache.distributed.dht.preloader.GridDhtPartitionsSingleMessage]
> java.lang.AssertionError: 3080586
> 	at org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager.clientTopology(GridCachePartitionExchangeManager.java:815)
> 	at org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture.updatePartitionSingleMap(GridDhtPartitionsExchangeFuture.java:3621)
> 	at org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture.processSingleMessage(GridDhtPartitionsExchangeFuture.java:2439)
> 	at org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture.access$100(GridDhtPartitionsExchangeFuture.java:137)
> 	at org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture$2.apply(GridDhtPartitionsExchangeFuture.java:2261)
> 	at org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture$2.apply(GridDhtPartitionsExchangeFuture.java:2249)
> 	at org.apache.ignite.internal.util.future.GridFutureAdapter.notifyListener(GridFutureAdapter.java:383)
> 	at org.apache.ignite.internal.util.future.GridFutureAdapter.listen(GridFutureAdapter.java:353)
> 	at org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture.onReceiveSingleMessage(GridDhtPartitionsExchangeFuture.java:2249)
> 	at org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager.processSinglePartitionUpdate(GridCachePartitionExchangeManager.java:1628)
> 	at org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager.access$1100(GridCachePartitionExchangeManager.java:141)
> 	at org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager$2.onMessage(GridCachePartitionExchangeManager.java:368)
> 	at org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager$2.onMessage(GridCachePartitionExchangeManager.java:332)
> 	at org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager$MessageHandler.apply(GridCachePartitionExchangeManager.java:2999)
> 	at org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager$MessageHandler.apply(GridCachePartitionExchangeManager.java:2978)
> 	at org.apache.ignite.internal.processors.cache.GridCacheIoManager.processMessage(GridCacheIoManager.java:1056)
> 	at org.apache.ignite.internal.processors.cache.GridCacheIoManager.onMessage0(GridCacheIoManager.java:581)
> 	at org.apache.ignite.internal.processors.cache.GridCacheIoManager.handleMessage(GridCacheIoManager.java:380)
> 	at org.apache.ignite.internal.processors.cache.GridCacheIoManager.handleMessage(GridCacheIoManager.java:306)
> 	at org.apache.ignite.internal.processors.cache.GridCacheIoManager.access$100(GridCacheIoManager.java:101)
> 	at org.apache.ignite.internal.processors.cache.GridCacheIoManager$1.onMessage(GridCacheIoManager.java:295)
> 	at org.apache.ignite.internal.managers.communication.GridIoManager.invokeListener(GridIoManager.java:1569)
> 	at org.apache.ignite.internal.managers.communication.GridIoManager.processRegularMessage0(GridIoManager.java:1197)
> 	at org.apache.ignite.internal.managers.communication.GridIoManager.access$4200(GridIoManager.java:127)
> 	at org.apache.ignite.internal.managers.communication.GridIoManager$9.run(GridIoManager.java:1093)
> 	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> 	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> 	at java.lang.Thread.run(Thread.java:748)
> {noformat}
> As one of the solutions - we shouldn't start such caches on resurrected nodes.
> We should save caches changes history somewhere and cluster-widely spread it to joining
nodes. 
> In a case when cache was only stopped, we can do nothing and start it lately when cache
start request received.
> In a case when cache was stopped & destroyed, we should clean persistence data for
that cache.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

Mime
View raw message