ignite-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Roman Kondakov (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (IGNITE-10376) Failed to touch in CacheOffheapEvictionManager
Date Fri, 30 Nov 2018 12:25:00 GMT

    [ https://issues.apache.org/jira/browse/IGNITE-10376?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16704658#comment-16704658
] 

Roman Kondakov edited comment on IGNITE-10376 at 11/30/18 12:24 PM:
--------------------------------------------------------------------

[~ivanan.fed], in my opinion NPE here is a consequence, but not a reason. Here is my vision
of this situation:
 # Some set of user and system threads hang on the binary metadata registration future: {{CacheObjectBinaryProcessorImpl.addMeta(CacheObjectBinaryProcessorImpl.java:495)}}
for *unknown* reason.
 # Blocked system threads are detected by the failure handler watchdog - we can see it in
the log.
 # Grid remains hanged until test timeout happens.
 # On test timeout all nodes are stopped by the test framework. During nodes stoping all managers
are stopped and cleaned, including {{CacheOffheapEvictionManager}}. After this point if we
call eviction manager from cache context, it we'll return {{null}}.
 # Along with stopping managers, all waiting threads are interrupted - we also can see it
in log: {{IgniteInterruptedCheckedException: Got interrupted while waiting for future to complete.}}
 # Having being interrupted threads release their locks, and unblock another threads which
continue to perform their tasks - send messages etc.
 # Messages sent by these threads cannot be processed in the proper way - nodes are stopping,
managers, including {{CacheOffheapEvictionManager}} are stopping too. Attempts to obtain
the eviction manager from cache context fails - cache context is {{null}} at this point,
so we have an NPE at this point.

As you can see NPE here appears at the very last stage. The main bug here - is the unknown  threads
hanging described in p. 1. 


was (Author: rkondakov):
[~ivanan.fed], in my opinion NPE here is a consequence, but not a reason. Here is my vision
of this situation:
 # Some set of user and system threads hang on the binary metadata registration future: {{CacheObjectBinaryProcessorImpl.addMeta(CacheObjectBinaryProcessorImpl.java:495)}}
for *unknown* reason.
 # Blocked system threads are detected by the failure handler watchdog - we can see it in
the log.
 # Grid remains hanged until test timeout happens.
 # On test timeout all nodes are stopped by the test framework. During nodes stoping all managers
are stopped and cleaned, including {{CacheOffheapEvictionManager}}. After this point if we
call eviction manager from cache context, it we'll return {{null}}.
 # Along with stopping managers, all waiting threads are interrupted - we also can see it
in log: {{IgniteInterruptedCheckedException: Got interrupted while waiting for future to complete.}}
 # Having being interrupted threads release their locks, and unblock another threads which
continue to perform their tasks - send messages etc.
 # Messages sent by these threads cannot be processed in the proper way - nodes are stopping,
managers, including {{CacheOffheapEvictionManager}} are stopping too. Attempts to obtain
the eviction manager from cache context fails - cache context is {{null}} at this point -
so we have an NPE at this point.

As you can see NPE here appears at the very last stage. The main bug here - is the unknown  threads
hanging described in p. 1. 

> Failed to touch in CacheOffheapEvictionManager
> ----------------------------------------------
>
>                 Key: IGNITE-10376
>                 URL: https://issues.apache.org/jira/browse/IGNITE-10376
>             Project: Ignite
>          Issue Type: Test
>    Affects Versions: 2.7
>            Reporter: Ivan Fedotov
>            Assignee: Ivan Fedotov
>            Priority: Blocker
>              Labels: MakeTeamcityGreenAgain, stability, test-fail
>         Attachments: IGNITE-10376 log.txt
>
>
> BinaryObjectException exception sometimes appears in [testAtomicOnheapTwoBackupAsyncFullSync|https://ci.ignite.apache.org/viewLog.html?buildId=2398013&tab=buildResultsDiv&buildTypeId=IgniteTests24Java8_ContinuousQuery4#testNameId3300126853696550025] at
the [moment|https://github.com/apache/ignite/blob/master/modules/core/src/test/java/org/apache/ignite/internal/processors/cache/query/continuous/CacheContinuousQueryOrderingEventTest.java#L371]
of CacheEntryProcessor invocation.
> {code}class org.apache.ignite.binary.BinaryObjectException: Failed to update meta data
for type: org.apache.ignite.internal.processors.cache.query.continuous.CacheContinuousQueryOrderingEventTest$QueryTestValue
> 	at org.apache.ignite.internal.processors.cache.binary.CacheObjectBinaryProcessorImpl.addMeta(CacheObjectBinaryProcessorImpl.java:516)
> 	at org.apache.ignite.internal.processors.cache.binary.CacheObjectBinaryProcessorImpl$1.addMeta(CacheObjectBinaryProcessorImpl.java:194)
> 	at org.apache.ignite.internal.binary.BinaryContext.updateMetadata(BinaryContext.java:1332)
> 	at org.apache.ignite.internal.processors.cache.distributed.dht.atomic.GridDhtAtomicCache.updateAllAsyncInternal0(GridDhtAtomicCache.java:1815)
> 	at org.apache.ignite.internal.processors.cache.distributed.dht.atomic.GridDhtAtomicCache.updateAllAsyncInternal(GridDhtAtomicCache.java:1668)
> 	at org.apache.ignite.internal.processors.cache.distributed.dht.atomic.GridNearAtomicAbstractUpdateFuture.sendSingleRequest(GridNearAtomicAbstractUpdateFuture.java:299)
> 	at org.apache.ignite.internal.processors.cache.distributed.dht.atomic.GridNearAtomicSingleUpdateFuture.map(GridNearAtomicSingleUpdateFuture.java:483)
> 	at org.apache.ignite.internal.processors.cache.distributed.dht.atomic.GridNearAtomicSingleUpdateFuture.mapOnTopology(GridNearAtomicSingleUpdateFuture.java:443)
> 	at org.apache.ignite.internal.processors.cache.distributed.dht.atomic.GridNearAtomicAbstractUpdateFuture.map(GridNearAtomicAbstractUpdateFuture.java:248)
> 	at org.apache.ignite.internal.processors.cache.distributed.dht.atomic.GridDhtAtomicCache.update0(GridDhtAtomicCache.java:1150)
> 	at org.apache.ignite.internal.processors.cache.distributed.dht.atomic.GridDhtAtomicCache.invoke0(GridDhtAtomicCache.java:831)
> 	at org.apache.ignite.internal.processors.cache.distributed.dht.atomic.GridDhtAtomicCache.invoke(GridDhtAtomicCache.java:787)
> 	at org.apache.ignite.internal.processors.cache.IgniteCacheProxyImpl.invoke(IgniteCacheProxyImpl.java:1438)
> 	at org.apache.ignite.internal.processors.cache.IgniteCacheProxyImpl.invoke(IgniteCacheProxyImpl.java:1482)
> 	at org.apache.ignite.internal.processors.cache.GatewayProtectedCacheProxy.invoke(GatewayProtectedCacheProxy.java:1228)
> 	at org.apache.ignite.internal.processors.cache.query.continuous.CacheContinuousQueryOrderingEventTest$1.run(CacheContinuousQueryOrderingEventTest.java:373)
> 	at org.apache.ignite.testframework.GridTestUtils$7.call(GridTestUtils.java:1300)
> 	at org.apache.ignite.testframework.GridTestThread.run(GridTestThread.java:84){code}
> It can be because of absence of locks in GridCacheMapEntry#touch(GridCacheMapEntry.java:5063).
> It seems that test does not work after integration MVCC in Continuous Query.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Mime
View raw message