From issues-return-97947-archive-asf-public=cust-asf.ponee.io@ignite.apache.org Sun Aug 18 10:49:03 2019 Return-Path: X-Original-To: archive-asf-public@cust-asf.ponee.io Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [207.244.88.153]) by mx-eu-01.ponee.io (Postfix) with SMTP id C7055180608 for ; Sun, 18 Aug 2019 12:49:02 +0200 (CEST) Received: (qmail 6520 invoked by uid 500); 18 Aug 2019 10:49:02 -0000 Mailing-List: contact issues-help@ignite.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@ignite.apache.org Delivered-To: mailing list issues@ignite.apache.org Received: (qmail 6510 invoked by uid 99); 18 Aug 2019 10:49:02 -0000 Received: from mailrelay1-us-west.apache.org (HELO mailrelay1-us-west.apache.org) (209.188.14.139) by apache.org (qpsmtpd/0.29) with ESMTP; Sun, 18 Aug 2019 10:49:02 +0000 Received: from jira-lw-us.apache.org (unknown [207.244.88.139]) by mailrelay1-us-west.apache.org (ASF Mail Server at mailrelay1-us-west.apache.org) with ESMTP id 6E050E00A9 for ; Sun, 18 Aug 2019 10:49:01 +0000 (UTC) Received: from jira-lw-us.apache.org (localhost [127.0.0.1]) by jira-lw-us.apache.org (ASF Mail Server at jira-lw-us.apache.org) with ESMTP id 2D6AD277A4 for ; Sun, 18 Aug 2019 10:49:00 +0000 (UTC) Date: Sun, 18 Aug 2019 10:49:00 +0000 (UTC) From: "Dmitriy Pavlov (JIRA)" To: issues@ignite.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (IGNITE-9562) Destroyed cache that resurrected on an old offline node breaks PME MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/IGNITE-9562?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16909937#comment-16909937 ] Dmitriy Pavlov commented on IGNITE-9562: ---------------------------------------- according to Eduard comments IgniteCacheRestartTestSuite2: IgniteCachePutAllRestartTest.testStopNode - needs to be researched PDS 1 [ tests 5 ] IgnitePdsTestSuite: IgnitePdsDestroyCacheTest.testDestroyCachesAbruptly - can be Ignored/failed because of https://issues.apache.org/jira/browse/IGNITE-8717 Cache 7 [ tests 2 ] IgniteCacheTestSuite7: CacheMetricsManageTest.testJmxPdsStatisticsEnable - this is an issue, need to be fixed. > Destroyed cache that resurrected on an old offline node breaks PME > ------------------------------------------------------------------ > > Key: IGNITE-9562 > URL: https://issues.apache.org/jira/browse/IGNITE-9562 > Project: Ignite > Issue Type: Bug > Components: cache > Affects Versions: 2.5 > Reporter: Pavel Kovalenko > Assignee: Eduard Shangareev > Priority: Critical > Fix For: 2.8, 2.7.6 > > Time Spent: 0.5h > Remaining Estimate: 0h > > Given: > 2 nodes, persistence enabled. > 1) Stop 1 node > 2) Destroy cache through client > 3) Start stopped node > When the stopped node joins to cluster it starts all caches that it has seen before stopping. > If that cache was cluster-widely destroyed it leads to breaking the crash recovery process or PME. > Root cause - we don't start/collect caches from the stopped node on another part of a cluster. > In case of PARTITIONED cache mode that scenario breaks crash recovery: > {noformat} > java.lang.AssertionError: AffinityTopologyVersion [topVer=-1, minorTopVer=0] > at org.apache.ignite.internal.processors.affinity.GridAffinityAssignmentCache.cachedAffinity(GridAffinityAssignmentCache.java:696) > at org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtPartitionTopologyImpl.updateLocal(GridDhtPartitionTopologyImpl.java:2449) > at org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtPartitionTopologyImpl.afterStateRestored(GridDhtPartitionTopologyImpl.java:679) > at org.apache.ignite.internal.processors.cache.persistence.GridCacheDatabaseSharedManager.restorePartitionStates(GridCacheDatabaseSharedManager.java:2445) > at org.apache.ignite.internal.processors.cache.persistence.GridCacheDatabaseSharedManager.applyLastUpdates(GridCacheDatabaseSharedManager.java:2321) > at org.apache.ignite.internal.processors.cache.persistence.GridCacheDatabaseSharedManager.restoreState(GridCacheDatabaseSharedManager.java:1568) > at org.apache.ignite.internal.processors.cache.persistence.GridCacheDatabaseSharedManager.beforeExchange(GridCacheDatabaseSharedManager.java:1308) > at org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture.distributedExchange(GridDhtPartitionsExchangeFuture.java:1255) > at org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture.init(GridDhtPartitionsExchangeFuture.java:766) > at org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager$ExchangeWorker.body0(GridCachePartitionExchangeManager.java:2577) > at org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager$ExchangeWorker.body(GridCachePartitionExchangeManager.java:2457) > at org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:110) > at java.lang.Thread.run(Thread.java:748) > {noformat} > In case of REPLICATED cache mode that scenario breaks PME coordinator process: > {noformat} > [2018-09-12 18:50:36,407][ERROR][sys-#148%distributed.CacheStopAndRessurectOnOldNodeTest0%][GridCacheIoManager] Failed to process message [senderId=4b6fd0d4-b756-4a9f-90ca-f0ee25100001, messageType=class o.a.i.i.processors.cache.distributed.dht.preloader.GridDhtPartitionsSingleMessage] > java.lang.AssertionError: 3080586 > at org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager.clientTopology(GridCachePartitionExchangeManager.java:815) > at org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture.updatePartitionSingleMap(GridDhtPartitionsExchangeFuture.java:3621) > at org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture.processSingleMessage(GridDhtPartitionsExchangeFuture.java:2439) > at org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture.access$100(GridDhtPartitionsExchangeFuture.java:137) > at org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture$2.apply(GridDhtPartitionsExchangeFuture.java:2261) > at org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture$2.apply(GridDhtPartitionsExchangeFuture.java:2249) > at org.apache.ignite.internal.util.future.GridFutureAdapter.notifyListener(GridFutureAdapter.java:383) > at org.apache.ignite.internal.util.future.GridFutureAdapter.listen(GridFutureAdapter.java:353) > at org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture.onReceiveSingleMessage(GridDhtPartitionsExchangeFuture.java:2249) > at org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager.processSinglePartitionUpdate(GridCachePartitionExchangeManager.java:1628) > at org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager.access$1100(GridCachePartitionExchangeManager.java:141) > at org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager$2.onMessage(GridCachePartitionExchangeManager.java:368) > at org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager$2.onMessage(GridCachePartitionExchangeManager.java:332) > at org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager$MessageHandler.apply(GridCachePartitionExchangeManager.java:2999) > at org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager$MessageHandler.apply(GridCachePartitionExchangeManager.java:2978) > at org.apache.ignite.internal.processors.cache.GridCacheIoManager.processMessage(GridCacheIoManager.java:1056) > at org.apache.ignite.internal.processors.cache.GridCacheIoManager.onMessage0(GridCacheIoManager.java:581) > at org.apache.ignite.internal.processors.cache.GridCacheIoManager.handleMessage(GridCacheIoManager.java:380) > at org.apache.ignite.internal.processors.cache.GridCacheIoManager.handleMessage(GridCacheIoManager.java:306) > at org.apache.ignite.internal.processors.cache.GridCacheIoManager.access$100(GridCacheIoManager.java:101) > at org.apache.ignite.internal.processors.cache.GridCacheIoManager$1.onMessage(GridCacheIoManager.java:295) > at org.apache.ignite.internal.managers.communication.GridIoManager.invokeListener(GridIoManager.java:1569) > at org.apache.ignite.internal.managers.communication.GridIoManager.processRegularMessage0(GridIoManager.java:1197) > at org.apache.ignite.internal.managers.communication.GridIoManager.access$4200(GridIoManager.java:127) > at org.apache.ignite.internal.managers.communication.GridIoManager$9.run(GridIoManager.java:1093) > at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) > {noformat} > As one of the solutions - we shouldn't start such caches on resurrected nodes. > We should save caches changes history somewhere and cluster-widely spread it to joining nodes. > In a case when cache was only stopped, we can do nothing and start it lately when cache start request received. > In a case when cache was stopped & destroyed, we should clean persistence data for that cache. -- This message was sent by Atlassian JIRA (v7.6.14#76016)