ignite-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Alexey Goncharuk (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (IGNITE-6113) Partition eviction prevents exchange from completion
Date Mon, 12 Feb 2018 13:16:00 GMT

    [ https://issues.apache.org/jira/browse/IGNITE-6113?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16360722#comment-16360722
] 

Alexey Goncharuk commented on IGNITE-6113:
------------------------------------------

Pavel,

1) The code around clearFuture looks suspicious to me: some of the clearFuture.onDone() are
sync-ed, some are not. Also, note that there is a clearFuture.listen(), and the listener may
be called either inside the sync block, or outside (if the listener is invoked from another
thread). In this case, the reset() call is likely unsynchronized with the listener chain invocation.
2) Partitions clear await is synchronous in system pool, we must avoid this. In best case
this will lead to a significant performance drop, in worst case - to a deadlock. The wait
should be asynchronous. We should probably also report some sort of partition clear progress
(or at least have a metric/mbean indicating that rebalancing wont start because we are waiting
for these partitions).
3) There is a suspicious getter remaining() in GridDhtPartitionDemander - the method is synchronized,
but it returns a reference to a map. What if the map changes afterwards?
4) Please add a specific test which will reproduce the absence of PME when async eviction
is happening. Also, we should add tests for the following partition state transitions:
MOVING->RENTING->MOVING->OWNING (add an optional node crash for each transition)
RENTING->MOVING->RENTING->EVICTED (add an optional node crash for each transition)

> Partition eviction prevents exchange from completion
> ----------------------------------------------------
>
>                 Key: IGNITE-6113
>                 URL: https://issues.apache.org/jira/browse/IGNITE-6113
>             Project: Ignite
>          Issue Type: Bug
>    Affects Versions: 2.1
>            Reporter: Vladislav Pyatkov
>            Assignee: Alexey Goncharuk
>            Priority: Major
>
> I has waited for 3 hours for completion without any success.
> exchange-worker is blocked.
> {noformat}
> "exchange-worker-#92%DPL_GRID%grid554.ca.sbrf.ru%" #173 prio=5 os_prio=0 tid=0x00007f0835c2e000
nid=0xb907 runnable [0x00007e74ab1d0000]
>    java.lang.Thread.State: TIMED_WAITING (parking)
>         at sun.misc.Unsafe.park(Native Method)
>         - parking to wait for  <0x00007efee630a7c0> (a org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtLocalPartition$1)
>         at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215)
>         at java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedNanos(AbstractQueuedSynchronizer.java:1037)
>         at java.util.concurrent.locks.AbstractQueuedSynchronizer.tryAcquireSharedNanos(AbstractQueuedSynchronizer.java:1328)
>         at org.apache.ignite.internal.util.future.GridFutureAdapter.get0(GridFutureAdapter.java:189)
>         at org.apache.ignite.internal.util.future.GridFutureAdapter.get(GridFutureAdapter.java:139)
>         at org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPreloader.assign(GridDhtPreloader.java:340)
>         at org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager$ExchangeWorker.body(GridCachePartitionExchangeManager.java:1801)
>         at org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:110)
>         at java.lang.Thread.run(Thread.java:748)
>    Locked ownable synchronizers:
>         - None
> {noformat}
> {noformat}
> "sys-#124%DPL_GRID%grid554.ca.sbrf.ru%" #278 prio=5 os_prio=0 tid=0x00007e731c02d000
nid=0xbf4d runnable [0x00007e734e7f7000]
>    java.lang.Thread.State: RUNNABLE
>         at sun.nio.ch.FileDispatcherImpl.write0(Native Method)
>         at sun.nio.ch.FileDispatcherImpl.write(FileDispatcherImpl.java:60)
>         at sun.nio.ch.IOUtil.writeFromNativeBuffer(IOUtil.java:93)
>         at sun.nio.ch.IOUtil.write(IOUtil.java:51)
>         at sun.nio.ch.FileChannelImpl.write(FileChannelImpl.java:211)
>         - locked <0x00007f056161bf88> (a java.lang.Object)
>         at org.gridgain.grid.cache.db.wal.FileWriteAheadLogManager$FileWriteHandle.writeBuffer(FileWriteAheadLogManager.java:1829)
>         at org.gridgain.grid.cache.db.wal.FileWriteAheadLogManager$FileWriteHandle.flush(FileWriteAheadLogManager.java:1572)
>         at org.gridgain.grid.cache.db.wal.FileWriteAheadLogManager$FileWriteHandle.addRecord(FileWriteAheadLogManager.java:1421)
>         at org.gridgain.grid.cache.db.wal.FileWriteAheadLogManager$FileWriteHandle.access$800(FileWriteAheadLogManager.java:1331)
>         at org.gridgain.grid.cache.db.wal.FileWriteAheadLogManager.log(FileWriteAheadLogManager.java:339)
>         at org.gridgain.grid.internal.processors.cache.database.pagemem.PageMemoryImpl.beforeReleaseWrite(PageMemoryImpl.java:1287)
>         at org.gridgain.grid.internal.processors.cache.database.pagemem.PageMemoryImpl.writeUnlockPage(PageMemoryImpl.java:1142)
>         at org.gridgain.grid.internal.processors.cache.database.pagemem.PageImpl.releaseWrite(PageImpl.java:167)
>         at org.apache.ignite.internal.processors.cache.database.tree.util.PageHandler.writeUnlock(PageHandler.java:193)
>         at org.apache.ignite.internal.processors.cache.database.tree.util.PageHandler.writePage(PageHandler.java:242)
>         at org.apache.ignite.internal.processors.cache.database.tree.util.PageHandler.writePage(PageHandler.java:119)
>         at org.apache.ignite.internal.processors.cache.database.tree.BPlusTree$Remove.doRemoveFromLeaf(BPlusTree.java:2886)
>         at org.apache.ignite.internal.processors.cache.database.tree.BPlusTree$Remove.removeFromLeaf(BPlusTree.java:2865)
>         at org.apache.ignite.internal.processors.cache.database.tree.BPlusTree$Remove.access$6900(BPlusTree.java:2515)
>         at org.apache.ignite.internal.processors.cache.database.tree.BPlusTree.removeDown(BPlusTree.java:1607)
>         at org.apache.ignite.internal.processors.cache.database.tree.BPlusTree.removeDown(BPlusTree.java:1574)
>         at org.apache.ignite.internal.processors.cache.database.tree.BPlusTree.removeDown(BPlusTree.java:1574)
>         at org.apache.ignite.internal.processors.cache.database.tree.BPlusTree.removeDown(BPlusTree.java:1574)
>         at org.apache.ignite.internal.processors.cache.database.tree.BPlusTree.removeDown(BPlusTree.java:1574)
>         at org.apache.ignite.internal.processors.cache.database.tree.BPlusTree.doRemove(BPlusTree.java:1481)
>         at org.apache.ignite.internal.processors.cache.database.tree.BPlusTree.remove(BPlusTree.java:1451)
>         at org.apache.ignite.internal.processors.query.h2.database.H2TreeIndex.remove(H2TreeIndex.java:307)
>         at org.apache.ignite.internal.processors.query.h2.opt.GridH2Table.doUpdate(GridH2Table.java:637)
>         at org.apache.ignite.internal.processors.query.h2.opt.GridH2Table.update(GridH2Table.java:517)
>         at org.apache.ignite.internal.processors.query.h2.IgniteH2Indexing.remove(IgniteH2Indexing.java:664)
>         at org.apache.ignite.internal.processors.query.GridQueryProcessor.remove(GridQueryProcessor.java:1186)
>         at org.apache.ignite.internal.processors.cache.query.GridCacheQueryManager.remove(GridCacheQueryManager.java:467)
>         at org.apache.ignite.internal.processors.cache.IgniteCacheOffheapManagerImpl$CacheDataStoreImpl.remove(IgniteCacheOffheapManagerImpl.java:1090)
>         at org.gridgain.grid.cache.db.GridCacheOffheapManager$GridCacheDataStore.remove(GridCacheOffheapManager.java:993)
>         at org.apache.ignite.internal.processors.cache.IgniteCacheOffheapManagerImpl.remove(IgniteCacheOffheapManagerImpl.java:357)
>         at org.apache.ignite.internal.processors.cache.GridCacheMapEntry.removeValue(GridCacheMapEntry.java:3621)
>         at org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtCacheEntry.clearInternal(GridDhtCacheEntry.java:599)
>         - locked <0x00007f054d45bad8> (a org.apache.ignite.internal.processors.cache.distributed.dht.colocated.GridDhtColocatedCacheEntry)
>         at org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtLocalPartition.clearAll(GridDhtLocalPartition.java:956)
>         at org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtLocalPartition.tryEvict(GridDhtLocalPartition.java:793)
>         at org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPreloader$9.call(GridDhtPreloader.java:856)
>         at org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPreloader$9.call(GridDhtPreloader.java:843)
>         at org.apache.ignite.internal.util.IgniteUtils.wrapThreadLoader(IgniteUtils.java:6660)
>         at org.apache.ignite.internal.processors.closure.GridClosureProcessor$2.body(GridClosureProcessor.java:925)
>         at org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:110)
>         at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>         at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>         at java.lang.Thread.run(Thread.java:748)
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Mime
View raw message