ignite-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "ASF GitHub Bot (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (IGNITE-9040) StopNodeFailureHandler is not able to stop node correctly on node segmentation
Date Fri, 20 Jul 2018 14:37:00 GMT

    [ https://issues.apache.org/jira/browse/IGNITE-9040?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16550827#comment-16550827
] 

ASF GitHub Bot commented on IGNITE-9040:
----------------------------------------

GitHub user sergey-chugunov-1985 opened a pull request:

    https://github.com/apache/ignite/pull/4395

    IGNITE-9040 new FailureHandler for node segmentation special case, test for the root cause
error

    

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/gridgain/apache-ignite ignite-9040

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/ignite/pull/4395.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #4395
    
----
commit 1a76eab29100002f7b8925c051c76763e87511d4
Author: Sergey Chugunov <sergey.chugunov@...>
Date:   2018-07-20T14:27:05Z

    IGNITE-9040 new FailureHandler for node segmentation special case, test for the root cause
error

----


> StopNodeFailureHandler is not able to stop node correctly on node segmentation
> ------------------------------------------------------------------------------
>
>                 Key: IGNITE-9040
>                 URL: https://issues.apache.org/jira/browse/IGNITE-9040
>             Project: Ignite
>          Issue Type: Bug
>    Affects Versions: 2.6
>            Reporter: Sergey Chugunov
>            Assignee: Sergey Chugunov
>            Priority: Major
>             Fix For: 2.7
>
>
> When flag *IGNITE_WAL_LOG_TX_RECORDS* is set up special TxRecords are logged to WAL even
on node stop.
> With STOP segmentation policy *StopNodeFailureHandler* is used to stop the segmented
node and it marks node's state as invalid. As a result all write requests to WAL get failed.
> So as part of stop-on-segmentation procedure node needs to log Tx but it cannot as its
state is marked as invalid. This leads to stop procedure finishing incorrectly, some threads
started by the node are not cleaned up.
> Exception example:
> {noformat}
> [2018-07-20 13:35:36,358][ERROR][node-stopper][ZookeeperDiscoverySpiTest0] Failed to
pre-stop processor: GridProcessorAdapter []
> class org.apache.ignite.IgniteException: Failed to log TxRecord: TxRecord [state=PREPARED,
nearXidVer=GridCacheVersion [topVer=143562918, order=1532082921780, nodeOrder=3], writeVer=GridCacheVersion
[topVer=143562918, order=1532082921781, nodeOrder=1], super=TimeStampRecord [timestamp=1532082936349]]
> 	at org.apache.ignite.internal.processors.cache.transactions.IgniteTxAdapter.state(IgniteTxAdapter.java:1132)
> 	at org.apache.ignite.internal.processors.cache.transactions.IgniteTxAdapter.state(IgniteTxAdapter.java:968)
> 	at org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtTxPrepareFuture.onComplete(GridDhtTxPrepareFuture.java:983)
> 	at org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtTxPrepareFuture.onDone(GridDhtTxPrepareFuture.java:717)
> 	at org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtTxPrepareFuture.onDone(GridDhtTxPrepareFuture.java:105)
> 	at org.apache.ignite.internal.util.future.GridFutureAdapter.onDone(GridFutureAdapter.java:462)
> 	at org.apache.ignite.internal.processors.cache.GridCacheMvccManager.cancelClientFutures(GridCacheMvccManager.java:425)
> 	at org.apache.ignite.internal.processors.cache.GridCacheMvccManager.onStop(GridCacheMvccManager.java:410)
> 	at org.apache.ignite.internal.processors.cache.GridCacheProcessor.onKernalStop(GridCacheProcessor.java:984)
> 	at org.apache.ignite.internal.IgniteKernal.stop0(IgniteKernal.java:2134)
> 	at org.apache.ignite.internal.IgniteKernal.stop(IgniteKernal.java:2082)
> 	at org.apache.ignite.internal.IgnitionEx$IgniteNamedInstance.stop0(IgnitionEx.java:2595)
> 	at org.apache.ignite.internal.IgnitionEx$IgniteNamedInstance.stop(IgnitionEx.java:2558)
> 	at org.apache.ignite.internal.IgnitionEx.stop(IgnitionEx.java:374)
> 	at org.apache.ignite.failure.StopNodeFailureHandler$1.run(StopNodeFailureHandler.java:36)
> 	at java.lang.Thread.run(Thread.java:745)
> Caused by: class org.apache.ignite.internal.pagemem.wal.StorageException: Failed to perform
WAL operation (environment was invalidated by a previous error)
> 	at org.apache.ignite.internal.processors.cache.persistence.wal.FileWriteAheadLogManager.checkNode(FileWriteAheadLogManager.java:1504)
> 	at org.apache.ignite.internal.processors.cache.persistence.wal.FileWriteAheadLogManager.access$6100(FileWriteAheadLogManager.java:143)
> 	at org.apache.ignite.internal.processors.cache.persistence.wal.FileWriteAheadLogManager$FileWriteHandle.addRecord(FileWriteAheadLogManager.java:2611)
> 	at org.apache.ignite.internal.processors.cache.persistence.wal.FileWriteAheadLogManager$FileWriteHandle.access$1500(FileWriteAheadLogManager.java:2521)
> 	at org.apache.ignite.internal.processors.cache.persistence.wal.FileWriteAheadLogManager.log(FileWriteAheadLogManager.java:758)
> 	at org.apache.ignite.internal.processors.cache.transactions.IgniteTxAdapter.state(IgniteTxAdapter.java:1127)
> 	... 15 more
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Mime
View raw message