ignite-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Pavel Kovalenko (JIRA)" <j...@apache.org>
Subject [jira] [Created] (IGNITE-9562) Destroyed cache that resurrected on a old offline node breaks PME
Date Wed, 12 Sep 2018 15:52:00 GMT
Pavel Kovalenko created IGNITE-9562:
---------------------------------------

             Summary: Destroyed cache that resurrected on a old offline node breaks PME
                 Key: IGNITE-9562
                 URL: https://issues.apache.org/jira/browse/IGNITE-9562
             Project: Ignite
          Issue Type: Bug
          Components: cache
    Affects Versions: 2.5
            Reporter: Pavel Kovalenko
             Fix For: 2.8


Given:
2 nodes, persistence enabled.
1) Stop 1 node
2) Destroy cache through client
3) Start stopped node

When the stopped node joins to cluster it starts all caches that it has seen before stopping.
If that cache was cluster-widely destroyed it leads to breaking the crash recovery process
or PME.

Root cause - we don't start/collect caches from the stopped node on another part of a cluster.

In case of PARTITIONED cache mode that scenario breaks crash recovery:
{noformat}
java.lang.AssertionError: AffinityTopologyVersion [topVer=-1, minorTopVer=0]

	at org.apache.ignite.internal.processors.affinity.GridAffinityAssignmentCache.cachedAffinity(GridAffinityAssignmentCache.java:696)
	at org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtPartitionTopologyImpl.updateLocal(GridDhtPartitionTopologyImpl.java:2449)
	at org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtPartitionTopologyImpl.afterStateRestored(GridDhtPartitionTopologyImpl.java:679)
	at org.apache.ignite.internal.processors.cache.persistence.GridCacheDatabaseSharedManager.restorePartitionStates(GridCacheDatabaseSharedManager.java:2445)
	at org.apache.ignite.internal.processors.cache.persistence.GridCacheDatabaseSharedManager.applyLastUpdates(GridCacheDatabaseSharedManager.java:2321)
	at org.apache.ignite.internal.processors.cache.persistence.GridCacheDatabaseSharedManager.restoreState(GridCacheDatabaseSharedManager.java:1568)
	at org.apache.ignite.internal.processors.cache.persistence.GridCacheDatabaseSharedManager.beforeExchange(GridCacheDatabaseSharedManager.java:1308)
	at org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture.distributedExchange(GridDhtPartitionsExchangeFuture.java:1255)
	at org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture.init(GridDhtPartitionsExchangeFuture.java:766)
	at org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager$ExchangeWorker.body0(GridCachePartitionExchangeManager.java:2577)
	at org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager$ExchangeWorker.body(GridCachePartitionExchangeManager.java:2457)
	at org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:110)
	at java.lang.Thread.run(Thread.java:748)
{noformat}

In case of REPLICATED cache mode that scenario breaks PME coordinator process:
{noformat}
[2018-09-12 18:50:36,407][ERROR][sys-#148%distributed.CacheStopAndRessurectOnOldNodeTest0%][GridCacheIoManager]
Failed to process message [senderId=4b6fd0d4-b756-4a9f-90ca-f0ee25100001, messageType=class
o.a.i.i.processors.cache.distributed.dht.preloader.GridDhtPartitionsSingleMessage]
java.lang.AssertionError: 3080586
	at org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager.clientTopology(GridCachePartitionExchangeManager.java:815)
	at org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture.updatePartitionSingleMap(GridDhtPartitionsExchangeFuture.java:3621)
	at org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture.processSingleMessage(GridDhtPartitionsExchangeFuture.java:2439)
	at org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture.access$100(GridDhtPartitionsExchangeFuture.java:137)
	at org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture$2.apply(GridDhtPartitionsExchangeFuture.java:2261)
	at org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture$2.apply(GridDhtPartitionsExchangeFuture.java:2249)
	at org.apache.ignite.internal.util.future.GridFutureAdapter.notifyListener(GridFutureAdapter.java:383)
	at org.apache.ignite.internal.util.future.GridFutureAdapter.listen(GridFutureAdapter.java:353)
	at org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture.onReceiveSingleMessage(GridDhtPartitionsExchangeFuture.java:2249)
	at org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager.processSinglePartitionUpdate(GridCachePartitionExchangeManager.java:1628)
	at org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager.access$1100(GridCachePartitionExchangeManager.java:141)
	at org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager$2.onMessage(GridCachePartitionExchangeManager.java:368)
	at org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager$2.onMessage(GridCachePartitionExchangeManager.java:332)
	at org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager$MessageHandler.apply(GridCachePartitionExchangeManager.java:2999)
	at org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager$MessageHandler.apply(GridCachePartitionExchangeManager.java:2978)
	at org.apache.ignite.internal.processors.cache.GridCacheIoManager.processMessage(GridCacheIoManager.java:1056)
	at org.apache.ignite.internal.processors.cache.GridCacheIoManager.onMessage0(GridCacheIoManager.java:581)
	at org.apache.ignite.internal.processors.cache.GridCacheIoManager.handleMessage(GridCacheIoManager.java:380)
	at org.apache.ignite.internal.processors.cache.GridCacheIoManager.handleMessage(GridCacheIoManager.java:306)
	at org.apache.ignite.internal.processors.cache.GridCacheIoManager.access$100(GridCacheIoManager.java:101)
	at org.apache.ignite.internal.processors.cache.GridCacheIoManager$1.onMessage(GridCacheIoManager.java:295)
	at org.apache.ignite.internal.managers.communication.GridIoManager.invokeListener(GridIoManager.java:1569)
	at org.apache.ignite.internal.managers.communication.GridIoManager.processRegularMessage0(GridIoManager.java:1197)
	at org.apache.ignite.internal.managers.communication.GridIoManager.access$4200(GridIoManager.java:127)
	at org.apache.ignite.internal.managers.communication.GridIoManager$9.run(GridIoManager.java:1093)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:748)
{noformat}

As one of the solutions - we shouldn't start such caches on resurrected nodes.
We should save caches changes history somewhere and cluster-widely spread it to joining nodes.

In a case when cache was only stopped, we can do nothing and start it lately when cache start
request received.
In a case when cache was stopped & destroyed, we should clean persistence data for that
cache.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Mime
View raw message