ignite-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Anton Vinogradov ...@apache.org>
Subject Re: Partition reserve/release asymmetry
Date Fri, 10 Jan 2020 12:21:24 GMT
Everything is fine.
Merged to master branch.

On Fri, Jan 10, 2020 at 9:48 AM Anton Vinogradov <av@apache.org> wrote:

> >> Does the issue reproduce in
> >> subsequent runs?
> Unfortunately no.
> We performed 30+ runs without "success".
>
> >> I think we can add an assertion to
> >> GridDhtLocalPartition#destroy() method to check that reservations is 0
> Ok, I will check and merge in case of success.
> Created the Issue to handle this [1].
>
> [1] https://issues.apache.org/jira/browse/IGNITE-12524
>
> On Thu, Jan 9, 2020 at 1:46 PM Alexey Goncharuk <
> alexey.goncharuk@gmail.com> wrote:
>
>> Hello Anton,
>>
>> Thanks for digging into this. The logic with checking the
>> reservations count seems fishy to me as well, so I have no objections with
>> the suggested change. This "if" statement does not answer why the
>> partition
>> was being destroyed during the commit, though. Does the issue reproduce in
>> subsequent runs?
>>
>> The logic around reserve/release seems ok to me, however, the
>> eviction/renting code looks overly complicated, perhaps, there is a bug
>> somewhere there? I think we can add an assertion to
>> GridDhtLocalPartition#destroy() method to check that reservations is 0
>> when
>> this method is called (there is a check for EVICTED state already there)
>>
>> --AG
>>
>> чт, 9 янв. 2020 г. в 09:45, Anton Vinogradov <av@apache.org>:
>>
>> > Folks,
>> > Yardstick run (opt-serial-put-get-1-backup) failed with interesting
>> > exception:
>> > Critical system error detected. Will be handled accordingly to
>> configured
>> > handler [hnd=StopNodeOrHaltFailureHandler [tryStop=false, timeout=0,
>> > super=AbstractFailureHandler [ignoredFailureTypes=UnmodifiableSet
>> > [SYSTEM_WORKER_BLOCKED, SYSTEM_CRITICAL_OPERATION_TIMEOUT]]],
>> > failureCtx=FailureContext [type=CRITICAL_ERROR, err=class
>> > o.a.i.i.transactions.IgniteTxHeuristicCheckedException: Committing a
>> > transaction has produced runtime exception]]
>> > class
>> >
>> org.apache.ignite.internal.transactions.IgniteTxHeuristicCheckedException:
>> > Committing a transaction has produced runtime exception
>> > at
>> >
>> >
>> org.apache.ignite.internal.processors.cache.transactions.IgniteTxAdapter.heuristicException(IgniteTxAdapter.java:800)
>> > at
>> >
>> >
>> org.apache.ignite.internal.processors.cache.distributed.GridDistributedTxRemoteAdapter.commitIfLocked(GridDistributedTxRemoteAdapter.java:838)
>> > at
>> >
>> >
>> org.apache.ignite.internal.processors.cache.distributed.GridDistributedTxRemoteAdapter.commitRemoteTx(GridDistributedTxRemoteAdapter.java:893)
>> > at
>> >
>> >
>> org.apache.ignite.internal.processors.cache.transactions.IgniteTxHandler.finish(IgniteTxHandler.java:1452)
>> > at
>> >
>> >
>> org.apache.ignite.internal.processors.cache.transactions.IgniteTxHandler.processDhtTxFinishRequest(IgniteTxHandler.java:1375)
>> > at
>> >
>> >
>> org.apache.ignite.internal.processors.cache.transactions.IgniteTxHandler.access$600(IgniteTxHandler.java:123)
>> > at
>> >
>> >
>> org.apache.ignite.internal.processors.cache.transactions.IgniteTxHandler$7.apply(IgniteTxHandler.java:241)
>> > at
>> >
>> >
>> org.apache.ignite.internal.processors.cache.transactions.IgniteTxHandler$7.apply(IgniteTxHandler.java:239)
>> > at
>> >
>> >
>> org.apache.ignite.internal.processors.cache.GridCacheIoManager.processMessage(GridCacheIoManager.java:1142)
>> > at
>> >
>> >
>> org.apache.ignite.internal.processors.cache.GridCacheIoManager.onMessage0(GridCacheIoManager.java:591)
>> > at
>> >
>> >
>> org.apache.ignite.internal.processors.cache.GridCacheIoManager.handleMessage(GridCacheIoManager.java:392)
>> > at
>> >
>> >
>> org.apache.ignite.internal.processors.cache.GridCacheIoManager.handleMessage(GridCacheIoManager.java:318)
>> > at
>> >
>> >
>> org.apache.ignite.internal.processors.cache.GridCacheIoManager.access$100(GridCacheIoManager.java:109)
>> > at
>> >
>> >
>> org.apache.ignite.internal.processors.cache.GridCacheIoManager$1.onMessage(GridCacheIoManager.java:308)
>> > at
>> >
>> >
>> org.apache.ignite.internal.managers.communication.GridIoManager.invokeListener(GridIoManager.java:1843)
>> > at
>> >
>> >
>> org.apache.ignite.internal.managers.communication.GridIoManager.processRegularMessage0(GridIoManager.java:1468)
>> > at
>> >
>> >
>> org.apache.ignite.internal.managers.communication.GridIoManager.access$5200(GridIoManager.java:229)
>> > at
>> >
>> >
>> org.apache.ignite.internal.managers.communication.GridIoManager$9.run(GridIoManager.java:1365)
>> > at
>> >
>> >
>> org.apache.ignite.internal.util.StripedExecutor$Stripe.body(StripedExecutor.java:555)
>> > at
>> >
>> org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:120)
>> > at java.lang.Thread.run(Thread.java:748)
>> > Caused by: java.lang.IllegalStateException: Tree is being concurrently
>> > destroyed: tx-p-470##CacheData
>> > at
>> >
>> >
>> org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree.checkDestroyed(BPlusTree.java:1011)
>> > at
>> >
>> >
>> org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree.invoke(BPlusTree.java:1831)
>> > at
>> >
>> >
>> org.apache.ignite.internal.processors.cache.IgniteCacheOffheapManagerImpl$CacheDataStoreImpl.invoke0(IgniteCacheOffheapManagerImpl.java:1696)
>> > at
>> >
>> >
>> org.apache.ignite.internal.processors.cache.IgniteCacheOffheapManagerImpl$CacheDataStoreImpl.invoke(IgniteCacheOffheapManagerImpl.java:1679)
>> > at
>> >
>> >
>> org.apache.ignite.internal.processors.cache.IgniteCacheOffheapManagerImpl.invoke(IgniteCacheOffheapManagerImpl.java:441)
>> > at
>> >
>> >
>> org.apache.ignite.internal.processors.cache.GridCacheMapEntry.storeValue(GridCacheMapEntry.java:4288)
>> > at
>> >
>> >
>> org.apache.ignite.internal.processors.cache.GridCacheMapEntry.storeValue(GridCacheMapEntry.java:4262)
>> > at
>> >
>> >
>> org.apache.ignite.internal.processors.cache.GridCacheMapEntry.innerSet(GridCacheMapEntry.java:1540)
>> > at
>> >
>> >
>> org.apache.ignite.internal.processors.cache.distributed.GridDistributedTxRemoteAdapter.commitIfLocked(GridDistributedTxRemoteAdapter.java:675)
>> > ... 19 more
>> >
>> > It seems, BPlusTree was destroyed between
>> > GridDistributedTxRemoteAdapter.java:545 and
>> > GridDistributedTxRemoteAdapter.java:675 while partition was reserved.
>> >
>> > See the full log [1] for details.
>> >
>> > During investigation weird code was found:
>> > private void release0(int sizeChange) {
>> >         while (true) {
>> >             long state = this.state.get();
>> >
>> >             int reservations = getReservations(state);
>> >
>> >             if (reservations == 0) // How can it be zero at release
>> > attempt?
>> >                 return;
>> >
>> > I've replaced this weird code with assertion [2] and checked at TeamCity
>> > twice, nothing failed.
>> >
>> > So, questions
>> > 1) Any Idea why we able to have zero reservations at release attempt?
>> > 2) Any objection to merging assertion instead of weird return to the
>> master
>> > branch?
>> > 3) Any Idea why the exception happens?
>> >
>> > [1]
>> >
>> >
>> https://gist.githubusercontent.com/anton-vinogradov/834fc63114a3e8d46b89ea4ccec8148b/raw/6438930c7fef119d0ad60df76d821fe7bd100c5e/gistfile1.txt
>> > [2]
>> >
>> >
>> https://gitbox.apache.org/repos/asf?p=ignite.git;a=commitdiff;h=b2c083564fb3b48ebe87042e0ed442dc0af3a74d
>> >
>>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message